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<H Abstract. We propose a method for automatically generating abstract transformers for 

vQ static analysis by abstract interpretation. The method focuses on linear constraints on 

rvj programs operating on rational, real or floating-point variables and containing linear as- 

signments and tests. Given the specification of an abstract domain, and a program block, 
I I our method automatically outputs an implementation of the corresponding abstract trans- 

om former. It is thus a form of program transformation. 

I_J In addition to loop-free code, the same method also applies for obtaining least fixed 

J points as functions of the precondition, which permits the analysis of loops and recursive 

^ functions. 

' ' The motivation of our work is data-fiow synchronous programming languages, used 

for building control-command embedded systems, but it also applies to imperative and 
functional programming. 

Our algorithms are based on quantifier elimination and symbolic manipulation tech- 
.^iL niques over linear arithmetic formulas. We also give less general results for nonlinear 

QQ constraints and nonlinear program constructs. 

in 

o 

O 1- Introduction 

• • Program analysis consists in deriving properties of the possible executions of a program 

.^H from an algorithmic processing of its source or object code. Example of interesting proper- 

/\ ties include: "the program always terminates"; "the program never executes a division by 

zero" ; "the program always outputs a well- formed XML document" ; "variable x always lies 
between 1 and 3". There has been a considerable amount of work done since the late 1970s 
on sound methods of program analysis — that is, methods that produce results that are 
guaranteed to hold for all program executions, as opposed to bug finding methods such as 
program testing, which cannot provide such guarantees in general. 

Static analysis by abstract interpretation is one of the various approaches to sound 
program analysis. Grossly speaking, abstract interpretation casts the problem of obtaining 
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supersets of the set of reachable states of programs into a problem of finding fixed points 
of certain monotone operators on certain ordered sets, known as abstract domains. When 
dealing with programs operating on arithmetic values (integer or real numbers, or, more 
realistically, bounded integers and floating-point values), these sets are often defined by 
numerical constraints, and ordered by inclusion. One may, for instance, attempt to compute, 
for each program point and each variable, an interval that is guaranteed to contain all 
possible values of that variable at that point. The problem is of course how to compute 
these fixed points. Obviously, the smaller the intervals, the better, so we would like to 
compute them as small as possible. Ideally, we would like to compute the least fixed point, 
that is, the least inductive invariant that can be expressed using intervals. 

The purpose of this article is to expose how to compute such least fixed points exactly, at 
least for certain classes of programs and certain abstract domains. Specifically, we consider 
programs operating over real variables using only linear comparisons (e.g. x + 2y < 3 
but not x'^ < 2/), and abstract domains defined using a finite number of linear constraints 
Yli ^i'^i — ^1 where the a, are fixed coefficients, C is a parameter (whose computation is the 
goal of the analysis) and the Vi are the program variables. Such domains evidently include 
the intervals, where the constraints are of the form Vi < C and —Vi < C. 

Not only can we compute such least fixed points exactly if all parameters are known, 
but we can also deal with the case where some of the parameters are unknowns, in which 
case we obtain the parameters of the least fixed point as explicit, algorithmic, functions 
of the unknowns. We can thus generate, once and for all, the abstract transformers for 
blocks of code: that is, those that map the parameters in the precondition of the block of 
code to a suitable postcondition. For instance, in the case of interval analysis, we derive 
explicit functions mapping the bounds on variables at the entrance of a loop-free block to 
the tightest bounds at the outcome of the block, or to the least inductive interval invariant 
of a loop. This allows modular analysis: it is possible to analyse functions or other kind of 
program modules (such as nodes in synchronous programming) in isolation of the rest of 
the code. 

A crucial difference with other methods, even on loop-free code, is that we derive the 
optimal — that is, the most precise — abstract transformer for the whole sequence. In 
contrast, most static analyses only have optimal transformers for individual instructions; 
they build transformers for whole sequences of code by composition of the transformers for 
the individual instructions. Even on very simple examples, the optimal transformer for a 
sequence is not the composition of the optimal transformers of the individual instructions. 
Furthermore, most other methods are forced to merge the information from different execu- 
tion traces at the end of "if-then-else" and other control flow constructs in a way that loses 
precision. In contrast, our method distinguishes different paths in the control flow graph as 
long as they do not go through loop iteration. 

Our approach is based on quantifier elimination, a technique operating on logical for- 
mulas that transforms a formula with quantifiers into an equivalent quantifier-free formula: 
for instance, \/x {x > y ^ x > 1) is turned into the equivalent y > 1. Essentially, we define 
the result that we would like to obtain using a quantified logical formula, and, using quanti- 
fier elimination and further formula manipulations, we obtain an algorithm that computes 
this result. Thus, one can see our method as automatically transforming a non executable 
specification of the result of the analysis into an algorithm computing that result. 

In ^ we shall provide background about abstract interpretation and quantifier elimi- 
nation. In ^ we shall provide the main results of the article, that is, how to derive optimal 
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abstract transformers for template linear constraint domains. In Q we shall investigate 
various extensions to that framework, still based on quantifier elimination in linear real 
arithmetic; e.g. how to deal with floating-point values. In ^ we shall explain how to move 
from the single loop case to more complex control flow. In ^ we shall describe our imple- 
mentation of the main algorithm and some limited extensions. In ^ we shall investigate 
extensions of the same framework using quantifier elimination on other theories, namely 
Presburger arithmetic and the theory of real closed fields (nonlinear real arithmetic). In 
^ we shall list some related work and compare our method to other relevant approaches. 
Finally, ^concludes. 

This article is based upon two conference articles f8^ [8^ . 

2. Background 

In this section, we shall recall a few definitions, notations and results on program 
analysis by abstract interpretation, then quantifier elimination. 

2.1. Abstract interpretation. It is well-known that, in the general case, fully automatic 
program analysis is impossible for any nontrivial property|J Thus, all analysis methods 
must have at least one of the following characteristics: 

• They may bound the memory size of the studied program, which then becomes 
a finite automaton, on which most properties are decidable. Explicit- state model- 
checking works by enumerating all reachable states, which is tractable only while 
implicit state model- checking represents sets of states using data structures such as 
binary decision diagrams p7j. 

• They may restrict the programming language used, making it not Turing-complete, 
so that properties become decidable. For instance, reachability in pushdown au- 
tomata is decidable even though their memory size is unbounded [17]. 

• They may restrict the class of properties expressed to properties of bounded execu- 
tions; e.g., "within the first 10000 steps of execution, there is no division by zero", 
as in bounded model checking [13j. 

• They may be unsound as proof methods: they may fail to detect that the desired 
property is violated. Typically, bug-finding and testing programs are in that cat- 
egory, because they may fail to detect a bug. Some such analysis techniques are 
not based on program semantics, but rather on finding patterns in the program 
syntax [38j . 

• They may be incomplete as proof methods: they may fail to prove that a certain 
property holds, and report spurious violations. Methods based on abstraction fall 
in that category. 

Abstraction by over-approximation consists in replacing the original problem, unde- 
cidable or very difficult to decide, by a simpler "abstract" problem whose behaviours are 
guaranteed to include all behaviours of the original problem. 

This result, formally given within the framework of recursive function theory, is known as Rice's theo- 
rem [96, p. 34] j92, corollary B]. It is obtained by generalisation from Turing's halting theorem. Interpreted 
upon program semantics, the theorem states that the only properties of the denotational semantics of pro- 
grams that can be algorithmically decided on the source code are the trivial properties: uniformly "true" or 
uniformly "false". 



DAVID MONNIAUX 



Listing 1: Original program 



int X, y; 




bool a; 




if (x < y) a = 


false ; 





Listing 


2: Boolean program 


bool a; 

if (nondet 


0) a = false; 



Figure 1 : Transformation of a program into a Boolean program by erasing the numeric part 
and replacing tests over numerical quantities by nondeterministic choice (nondet() 
nondeterministically returns true or false). 



An example of an abstraction is to erase from the program all constructs dealing with 
numerical and pointer types (or replacing them with nondeterministic choices, if their value 
is used within a test), keeping only Boolean types (Fig. [l]). Obviously, the behaviours of 
the resulting program encompass all the behaviours of the original program, plus maybe 
some extra ones. 

Further abstraction can be applied to this Boolean program: for instance, the "3-value 
logic" abstraction [91| which maps any input or output variable to an abstract parameter 
taking its value in a 3-element set: "is 0", "is 1", "can be or l"jJfor practical purposes 
it may be easier to encode these values using a couple of Booleans, respectively meaning 
"can be 0" and "can be 1", thus the abstract values (1,0), (0,1) and (1,1). The abstract 
value (0, 0) obtained for any variable at a program point means that this program point 
is unreachable. Given a vector of input abstract parameters, one for each input variable 
of the program, the forward abstract transfer function gives a correct vector of output 
abstract parameters, one for each output variable of the program. Quite obviously, in the 
absence of loops, it is possible to obtain a suitable forward abstract transfer function by 
applying simple logical rules to the Boolean function defined by the Boolean program. An 
effective implementation of the forward abstract transfer function can thus be obtained by 
a transformation of the source program. 

For programs operating over numerical quantities, a common abstraction is inter- 
vals. |33l EH To each input x, one associates an interval [xmmj^Jmax], to each output 
x' an interval [a^mm'^max]- How can one compute the {x'^^^, x'j^g^^)x£V bounds from the 
(a^min, a;max)zGy? The most common method is interval arithmetic: to each elementary 
arithmetic operation, one attaches an abstract counterpart that gives bounds on the output 
of the operation given bounds on the inputs. For instance, if one knows [amim Omax] and 

[bmm,brna,x], and C = a + b, then one computes [Cmin,Cmax] as follows: Cmin = flmin + ^min 

and Cmax = flmax + ^max- If a program point can be reached by several paths (e.g. at the 
end of an if-then-else construct), then a suitable interval [xmim a^max] can be obtained by a 
join of all the intervals for x at the end of these paths: [a, b] n [c, d] = [min(a, c), max(6, d)]. 
Again, for a loop-free program, one can obtain a suitable effective forward abstract transfer 
function by a program transformation of the source code. 

The abstract transfer function defined by interval arithmetic is always correct, but is not 
necessarily the most precise. For instance, on example in Fig. [2| the best abstract transfer 
function maps any input range to Zmin = -Zmax = 0, since the output z is always zero. 



For brevity, we identify "false" with and "true" with 1. 
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Listing 3: 


Program computing zero 


double X, 


y, z; 


/* X lies 


in [a^miri) S^maxJ 1 */ 


y = x; 




z = x-y; 







Listing 4 


Interval transformer 


ymin 


^ S^niin / 




ymax 


^ XrcLia 1 




^min 


^ ^min 


Umax 1 


^max 


^ ^Jniax ~ 


Vvaxa ) 



Figure 2: The transformer for the interval domain obtained by composition of locally opti- 
mal abstract transformers is imprecise. For each statement (on the left) we use 
a corresponding optimal transformer (on the right), but the composition of these 
transformers is not optimal. For the sake of simplicity, all variables are considered 
to be real numbers. 



while the one obtained by applying interval arithmetic to all program statements yields, in 
general, larger intervals. The weakness of the interval domain on this example is evidently 
due to the fact that it does not keep track of relationships between variables (here, that 
X = y). Relational abstract domains such as the octagons [731 [76l [77] or convex polyhedra 
in [Ml [58] address this issue. Yet, neither octagons nor polyhedra provide analyses that are 
guaranteed to give optimal results. 
Consider the following program: 



Listing 5: Ue 


distinguisl 


led paths 


int x; 






if (x > 0) x= 1; else x= -1; 






if (x == 0) x= 2; 







Obviously, the second test can never be taken, since x can only be ±1; however an interval, 
octagon or polyhedral analysis will conclude, after joining the informations for both branches 
of the first test, that x lies in [—1, 1] and will not be able to exclude the case x = 0. The 
final invariant will therefore be x € [—1, 2]. 

In contrast, our method, applied to this example, will correctly conclude that the 
optimal output invariant for x is [—1,1]. In fact, our method yields the same result as 
considering the (potentially exponentially large) set of paths between the beginning and 
the end of the program, and for each path, computing the least output interval, then 
computing the join of all these intervals. 

We have so far left out programs containing loops; when programs contain loops or 
recursive functions, a central problem of program analysis is to find inductive invariants. In 
the case of Boolean programs, given constraints on the input parameters, the set of reachable 
states can be computed exactly by model-checking algorithms; yet, these algorithms do not 
give a closed- form representation of the abstract transfer function mapping input parameters 
to output parameters for the 3- value abstraction. In the case of numerical abstractions such 
as the intervals, octagons or polyhedra, the most common way to find invariants is through 
the use of a widening operator [3H [35] . 

Intuitively, widening operators observe the sets of reachable states after A^ and A^ + 1 
loop iterations and extrapolate them to a "candidate invariant" . For instance, the widening 
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Figure 3: The standard widening on convex polyhedra |36|l58j. here demonstrated on poly- 
hedra in dimension 2 (polygons). The widening operator observes the sets of 
reachable states Pq and Pi at two consecutive iterations, and keeps only the con- 
straints (polyhedral faces, here polygon edges) that are stable across iterations. 
The resulting Poo polyhedron is then proposed as an invariant. 



operator, observing a sequence of intervals [0, 1], [0,2], [0,3] may wish to try [0, +oo). See 
Fig. [3] for an example with the standard widening operator on convex polyhedra. 

Let no be the set of initial states of a loop, and let — )•,- be transition relation for this 
loop (o" — )-,- a' means that a' is reachable in one loop step from a). The set of reachable 
states at the head of the loop is the least fixed point of / : u i— )• n U {a' \ 3a £ uAa — )•,- a'}, 
which is obtained as the limit of the ascending sequence defined by «„+! = f{un)- By 
abstract interpretation, we replace this sequence by an abstract sequence Un defined by 
Un_^_i = P{un), such that for any n, Un is an abstraction of Un- If this sequence is stationary, 

that is, Ui^_^_i = n^ for some A'', then u^^ is an abstraction of the least fixed point of / and 
thus of the least invariant of the transition relation r containing uq. 

When one uses a widening operator V, the function /^(njj) is defined as Un\/ fp{un) 
where fp is an abstraction of /. The design of the widening operator ensures convergence of 
Un in finite time. The exceptions to the use of widening operators are static analysis domains 
that satisfy the ascending condition, such as the domain of linear equality constraints [65] 
and that of linear congruence constraints [55] : with /^ = fp the sequence Un always becomes 
stable within finite time. 
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Listing 6: Circular buffer 



i = 0; 

while (true) { 
if (nondetO) { 
i = 1+1; 

if (1 >= 10) 1=0; 
} 
} 



Again, widening operators provide correct results, but these results can be grossly 
over-approximated. Much of the literature on applied analysis by abstract interpretation 
discusses workarounds that give precise results in certain cases: narrowing iterations [33113^. 
widening "up to" [571 §3.2], "delayed" or with "thresholds" [E], etc. 

A simple example of a program with one single variable where narrowing iterations fail 
to improve precision is Listing [6j This program is a much simplified version of a piece 
of code maintaining a circular buffer inside a large reactive control loop, where some piece 
of data is inserted only at certain loop iterations. We only kept the instructions relevant 
to the array index i and abstracted away the choice of the loop iterations where data is 
inserted as nondeterministic nondet(). 

If we analyse this loop using intervals with the standard widening with a widening 
point at the head of the loop, we obtain the sequence [0, 0], [0, 1], [0, 2] and then widening 
to [0, +00). Narrowing iterations then fail to increase the precision. The reason is that the 
transition relation for this loop includes the identity function (when nondet() is false), thus 
the concrete function whose least fixed point defines the set of reachable states [35] satisfies 
X C <J3{X) for all X (in other word, 4> is expansive). Thus, once the widening operator 
overshoots the least fixed point, it can never recover it. 

A similar problem is posed by: 



i 


= 0; 








w 


hile 

i = 


( true 

i + 1; 


) { 






if ( 


i == 


10) 


i=0; 


} 











The usual widening iterations overshoot to [0,-|-oo), and narrowing does not recover from 
there. Furthermore, this example illustrates how widening makes analysis non monotonia: 
contrary to one could expect, having extra precision on the precondition of a program can 
result in worse precision for the inferred invariants. For instance, consider the above problem 
and replace the first line by assume(i>=0 && i<=9). Clearly, the resulting program is an 
abstraction of the above example, since it has strictly more behaviours (we allow 1, . . . , 9 
as initial values for i). Yet, the analysis of the loop will yield a more precise behaviour: the 
interval [0, 9] is stable and the analysis terminates immediately. 

Both these very simple examples can be precisely analysed using widening up to [571 
§3.2], also known as widening with thresholds J15i Sec. 7.1.2]: a preliminary phase collects 
all constants to which i is compared, and instead of widening to +00, one widens to the 
next larger such constant if it exists — in this case, since x < 10 stands for 2; < 9, widening 
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with threshold would widen to 9, which is the correct value. This approach is not general — 
it fails if instead of the constant 10 we have some computed value. Of course, improvements 
are possible: for instance, one could analyse all the program up to this loop in order to 
prove that certain variables are constant, then use this information for setting thresholds 
for further loops. Yet, again, this is not a general approach. 

This is the second problem that this article addresses: how to obtain, in general, optimal 
invariants for certain classes of programs and numerical constraints. Furthermore, our 
methods provide these invariants as functions of the parameters of the precondition of the 
loop; this is one difference with our proposal, which computes the best, thus, again, they 
provide effective, optimal abstract transfer functions for loop constructs. 

2.2. Quantifier elimination. Consider a set A of atomic formulas. The set U{A) of 

quantifier-free formulas is the set of formulas constructed from A using operators A, V and 
-i; the set Q{A) of quantified formulas is the set of formulas constructed from A using the 
above operators and the 3 and V quantifiers. Such formulas are thus trees whose leaves are 
the atomic formulas. A literal is an atomic formula or the negation thereof. The set of 
free variables FV{F) of a formula F is defined as usual. A quantifier-free formula without 
variables is said to be ground. A formula without free variables is said to be closed; the 
existential closure of a formula F is F with existential quantifiers for all free variables 
prepended; the universal closure is the same with universal quantifiers. A quantifier-free 
formula is said to be in disjunctive normal form (DNF) if it is a disjunction of conjunctions, 
that is, is of the form {li^i A • • • A /i,ni) V • • • V {lm,i A • • • A lm,nm)i ^-i^d is said to be 
in conjunctive normal form (CNF) if it is a conjunction of disjunctions. Any quantifier- 
free formula can be converted into CNF or DNF by application of the distributivity laws 
{AvB)AC = {AAC)V{BAC)and{AAB)vC={AvC)A{BV C), though better 
algorithms exist, such as ALL-SAT modulo theory |81j . 



2.2.1. Linear real inequalities. Let A be the set of linear inequalities with integer or rational 
coefficients over a set of variables V. By elementary calculus, such inequalities can be 
equivalently written in the following forms: l{vi, V2, . . .) > C or l{vi,V2, . . .) > C, with / 
a linear expression with integer coefficients over V and C a constant. Let us first consider 
the theory of linear real arithmetic (LRA): models of a formula F are mappings from F to 
the real field M, and notions of equivalence and satisfiability follow. Note that satisfiability 
and equivalence are not affected by taking models to be mappings from F to the rational 
field Q. Deciding whether a LRA formula is satisfiable is, again, a NP-complete problem 
known as satisfiability modulo theory (SMT) of real linear arithmetic. Again, practical 
implementations, known as SMT-solvers , are capable of dealing with rather large formulas; 
examples include Yices [35] and Z3 [41jn 

The theory of linear real arithmetic admits quantifier elimination. For instance, the 
quantified formula Wx (x > y =^ a; > 3) is equivalent to the quantifier- free formula y > 3. 
Quantified linear real arithmetic formulas are thus decidable: by quantifier elimination, one 
can convert the existential closure of the formula to an equivalent ground formula, the truth 
of which is trivially decidable by evaluation. The decision problem for quantified formulas 



^The yearly SMT-COMP' competition has SMT-solvers compete on a large set of benchmarks. The SMT-' 



LIB site [5] documents various theories amenable to SMT, including large libraries of benchmarks. Kroening 



and Strichman 66 is an excellent introductory material to the algorithms behind SMT-solving. 
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over rational linear inequalities requires at least exponential time, thus quantifier elimination 
is at least exponential. [191 §7-4]. [IHl Th. 3] Weispfenning |107j discusses complexity issues 
in more detail. 

Again, most quantifier elimination algorithms proceed by induction over the structure 
of the formula, and thus begin by eliminating the innermost quantifiers, progressively re- 
placing branches of the formula containing quantifiers by quantifier- free equivalent branches. 
By application of the equivalence \/x F = -i3x -iF, one can reduce the problem to elimi- 
nating existential quantifiers only. Consider now the problem of eliminating the existential 
quantifiers from 3 xi . . .Xn F where F is quantifier-free. We can first convert into DNF: 
3x1 ■ ■ -Xn (Ci V • • • V Cm) where the Ci are conjunctions, then to the equivalent formula 
(3 xi . . . Xn Ci) V • • • V (3 xi . . . Xn Cm)- We thus have reduced the quantifier elimina- 
tion problem for general formula to the problem of quantifier elimination for conjunctions 
of linear inequalities. Remark that, geometrically, this quantifier elimination amounts to 
computing the projection of a convex polyhedron along the dimensions associated with the 
variables xi, . . . ,x„, with the original polyhedron and its projection being defined by sys- 
tems of linear inequalities. The Fourier-Motzkin elimination procedure [66|, §5.4] converts 
3x1 ■ ■ -Xn C into an equivalent conjunction. This is what we refer to the "conversion to 
DNF followed by projection approach". This approach is good for quickly proving that 
the theory admits quantifier elimination, but it is very inefficient. We shall now see better 
methods. 

Ferrante and Rackoff ^ proposed a substitution method [El §7.3.1] [571 ^4.2] [T07]: a 
formula of the form 3x F where F is quantifier-free is replaced by an equivalent disjunction 
F[ei/x]y ■ ■ -y F[en/x], where the ej are affine linear expressions built from the free variables 
of 3x F. Note the similarity to the naive elimination procedure we described for Boolean 
variables: even though the existential quantifier ranges over an infinite set of values, it 
is in fact only necessary to test the formula -F at a finite number of points (see Fig. H|. 
The drawback of this algorithm is that n is proportional to the square of the number of 
occurrences of x in the formula; thus, the size of the formula can be cubed for each quantifier 
eliminated. Loos and Weispfenning |69] proposed a virtual substitution algorithnij [87, §4.4] 
that works along the same general ideas but for which n is proportional to the number of 



occurrences of x in the formula. Our benchmarks show that Loos and Weispfenning 
algorithm is generally much more efficient than Ferrante and Rackofff s, despite the latter 
method being better known. |80t [HI] 

The main drawback of substitution algorithms is that the size of the formulas generally 
grows very fast as successive variables are eliminated. There are few opportunities for 
simplifications, save for replacing inequalities equivalent to false (e.g. < 0) by false and 
similarly for true, then applying trivial rewrites (e.g. false V x -^ x, false Ax -^ false). Our 
experience is that these algorithms tend to terminate with out-of- memory |80L IHT] . Scholl 
et al. |100j have proposed using and-inverter graphs (AIGs) and SAT modulo theory (SMT) 
solving in order to simplify formulas and keeping their size manageable. 

Another class of algorithms improve on the conversion to DNF then projection ap- 
proach, by combining both phases: we proposed an eager algorithm based on this idea [8lj , 

This method replaces a; by a formula that does not evaluate to a rational number, but to a sum of 
a rational number and optionally an infinitesimal e, taken to be a number greater than but less than 
any positive real; the infinitesimals are then erased by application of the rules governing comparisons. In 
practical implementations, one does both substitution and erasure of infinitesimals in one single pass, and 
infinitesimals never actually appear in formulas; thus the phrase virtual substitution. 
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Figure 4: The gray zone S is the set of (x, y) solutions of formula F, whose atoms are the 
linear inequalities corresponding to the lines A drawn with dashes. For fixed 
y = yo, the set of x such that F{x,y) is true is made of intervals whose ends lie 
within the set / of intersections of the y = yo line with the lines in A, drawn 
with a small circle, y = yo therefore has an intersection with S if and only if 
a point in /, or an interval with both ends in / U {— oo,+oo}, lies within S. 
This condition can be tested using x — )• ±00 and all midpoints to intervals with 
both ends in /, as per Ferrante and Rackoff |47j . or, in addition to / U {—00}, 
for any element of I a point infinitesimally close to the right of it, as per Loos 
and Weispfenning [69] . Both methods exploit the fact that the coordinates of all 
points from / (intersection of y = yo and a line from A) can be expressed as affine 
linear functions of yo- 



then a lazy version, which computes parts of formulas as needed [80]. Instead of syntactic 
conversion to DNF, we use a SMT solver to point to successive elements of the DNF, and 
instead of using Fourier-Motzkin elimination, which tends to needlessly blow up the size of 
the formulas, we use libraries for computations over convex polyhedra, which can compute 
a minimal constraint representation of the projection of a polyhedron given by constraints 
(that is, inequalities) — which is the geometrical counterpart of performing elimination of 
a block of existentially quantified variables. Experiments have shown that such methods 
are generally competitive with substitution approaches, with different classes of benchmarks 
showing a preference for one of the two families of methods [8OJ ; for the kinds of problems we 
consider in this article, it seems that the SMT+projection methods are more efficient |81| . 
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2.2.2. Preshurger arithmetic. The theory of linear integer arithmetic (LIA), also known as 
Presburger arithmetic, has the same syntax for formulas, but another semantics, replacing 
rational numbers (Q) by integers (Z). Linear inequalities are then insufficient for quantifier 
elimination — we also need congruence constraints: 3k x = 2k simplifies to x = (mod 2). 

Decision of formulas in Presburger arithmetic is doubly exponential, and thus quantifier 
elimination is very expensive in the worst case |48j. Presburger |89j provided a quantifier 
elimination procedure, but its complexity was impractical; Cooper [30] proposed a better 
algorithm ^ §7.2]; Pugh |90] proposed the "Omega test" [Ml §5.5]. 

The practical complexity of Presburger arithmetic is high. In particular, the formulas 
produced tend to be very complex, even when there exists a considerably simpler and 
"understandable" equivalent formula, as seen with experiments in §7.1[ 

Cooper and Pughf s procedures are very geometrical in nature. Integers, however, can 



also be seen as words over the {0, 1} alphabet, and sets of integers can thus be recognised 
by finite automata |88^ §8]. Addition is encoded as a 3-track automaton recognising that 
the number on the third track truly is the sum of the numbers on the first two tracks; 
equivalently, this encodes subtraction. Existential quantifier elimination just removes some 
of the tracks, making transitions depending on bits read on that track nondeterministic. 
Negation is complementation (which can be costly, thus explaining the high cost of quantifier 
alternation). Multiplication by powers of two is also easily encoded, and multiplications by 
arbitrary constants can be encoded by a combination of additions and multiplications by 
powers of twojj The same idea can be extended to real numbers written as their binary 
expansion, using automata on infinite words. 

This leads to an interesting arithmetic theory, with two sorts of variables: reals (or 
rationals) and integers. This could be used to model computer programs, with integers for 
integer variables and reals for floating-point variables (if necessary by using the semantic 



transformations described in § 4.5). Boigelot et al. jTB] described a restricted class of w- 
automata sufficient for quantifier elimination. Becker et al. ^llj implemented the Lira 
tool based on such ideas. Unfortunately, this approach suffers from two major drawbacks: 
the practical performances are very bad for purely real problems |81j, and it is impossible 
to recover an arithmetical formula from almost all these automata. We therefore did not 
pursue this direction. 

2.2.3. Nonlinear real arithmetic. What happens if we do not limit ourselves to linear arith- 
metic, but also allow polynomials? Over the integers, the resulting theory is known as 
Peano arithmetic. It is well known that there can exist no decision procedure for quantified 
Peano arithmetic formulas jj Since a quantifier elimination algorithm would turn a quanti- 
fied formula without free variables into an equivalent ground formula, and ground formulas 



This method embeds Presburger arithmetic into a stronger arithmetic theory, represented by the au- 
tomata, then performs elimination over these automata. This partly explains why it is difficult to recover a 
Presburger formula from the resulting automaton. 

One does not need the full language of quantified Peano formulas for the problem to become unde- 
cidable. It is known that there exists no algorithm that decides whether a given nonlinear Diophantine 
equation (a polynomial equation with integer coefficients) has solutions, and that deciding such a problem is 
equivalent to deciding Turing's halting problem, in other words, it is impossible to decide whether a formula 
P{xi , . . . ,Xn) — A a;i > A • • • A a;„ > is satisfiable over the integers. See the literature on Hilbert's tenth 
problem, e.g. the book by Matiyasevich [73) . 
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are trivially decidable, it follows that there can exist no quantifier elimination algorithm for 
this theory. 

The situation is wholly different over the real numbers. The satisfiability or equivalence 
of polynomial formulas does not change whether the models are taken over the real numbers, 
the real algebraic numbers, or, for the matter, any real closed field; this theory is thus 
known as the theory of real closed fields, or elementary algebra. Tarski |105[ I106J and 
Seidenberg |101j showed that this theory admits quantifier elimination, but their algorithms 
had impractically high complexity. Collins [28] introduced a better algorithm based on 
cylindrical algebraic decomposition. For instance, quantifier elimination on 3x ax'^ + bx + c = 
by cylindrical algebraic decomposition yields 

c<0AM6<0Aa>— jV(6 = 0Aa>0)v(6>0Aa>— jjJVc = OV 

c>0An6<0A a< — J V(6=0Aa <0) V (6>0Aa< ^jj) (2.1) 

Note the cylindrical decomposition: first, there is a case disjunction according to the values 
of c, then, for each disjunct for c, a case disjunction for the value of b; more generally, 
cylindrical algebraic decomposition builds a tree of case disjunctions over a sequence of 
variables vi,V2, . . . , where the guard expressions defining the cases for Vi can only refer 
to vi, . . . ,Vi. This decomposition only depends on the polynomials inside the formula and 
not on its Boolean structure, and computing it may be very costly even if the final result 
is simple. This is the intuition why despite various improvements [10^ ^^ the practical 
complexity of quantifier elimination algorithms for the theory of real closed fields remain 
high. The theoretical space complexity is doubly exponential |2H I40j . This is why our 



results in ^7.2 are of a theoretical rather than practical interest. 

Minimal extensions to this formula language may lead to undecidability. This is for 
instance the case when one adds trigonometric functions: it is possible to define vr as the 
least positive zero of the sine, then define the set of integers as the numbers k such that 
sin(A:7r) = 0, and thus one can encode Peano arithmetic formulas into that language [2j. 
Also, naive restrictions of the language do not lead to lower complexity. For instance, 
limiting the degree of the polynomials to two does not make the problem simpler, since 
formulas with polynomials of arbitrary degrees can be encoded as formulas with polynomials 
of degree at most two, simply by introducing new variables standing for subterms of the 
original polynomials. For instance, 3x ax^ + bx'^ + cx + d = can be encoded, using Horner's 
form for the polynomial, as 3x3y3z z = ax + bAy = zx + cAyx + d = 0. Certain stronger 
restrictions may however work; for instance, if the variables to be eliminated occur only 



linearly, then one can adapt the substitution methods described in ^2.2.1 



3. Optimal Abstraction over Template Linear Constraint Domains 

When applying abstract interpretation over domains of linear constraints, such as oc- 
tagons [711 ESl EZ], one generally applies a widening operator, which may lead to impre- 
cisions. In some cases, acceleration techniques leading to precise results can be applied 
[52l [53] : instead of attempting to extrapolate a sequence of iterates to its limit, as does 
widening, the exact limit is computed. In this section, we describe a class of constraint 
domains and programs for which abstract transfer functions of loop-free codes and of loops 
can be exactly computed; thus the optimality. Furthermore, the analysis outputs these 
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functions in closed form (as explicit expressions combining linear expressions and func- 
tional if-then-else constructs), so the result of the analysis of a program fragment can be 
stored away for future use; thus the modularity. Our algorithms are based on quantifier 
elimination over the theory of real linear arithmetic (^2.2l). 



3.1. Template Linear Constraint Domains. Let F be a formula over linear inequalities. 
We call F a domain definition formula if the free variables of F split into n parameters 
pi, . . . ,Pn and m state variables si, . . . ,Sm- We note 'jp '■ Q" — >• 'P(Q'") defined by ^f{p) = 
{s E Q™ I (p, s) 1= -F}. As an example, the interval abstract domain for 3 program variables 
•si, S2, S3 uses 6 parameters mi, Mi,m2, M2, ms, M3; the formula is mi < si < Mi A ?7i2 < 
S2 < M2 A ms < S3 < M3. 

In this section, we focus on the case where F is a conjunction Li(si, . . . , Sm) < Pi A 
• • • A L„(si, . . . , Sm) < Pn of linear inequalities whose left-hand side is fixed and the right- 
hand sides are parameters. Such conjunctions define the class of template linear constraint 
domains [99]. Particular examples of abstract domains in this class are: 

• the intervals (for any variable s, consider the linear forms s and — s); because of the 
inconvenience of talking intervals of the form [—a, 6], we shall often taking them of 
the form [2;min,a;max], with the optimal value for Xmin being a greatest lower bound 
instead of a least upper bound; 

• the difference bound matrices (for any variables si and S2, consider the linear form 
si -S2); 

• the octagon abstract domain (for any variables si and S2, distinct or not, consider 
the linear forms ibsi it S2) |74| 

• the octahedra (for any tuple of variables si, . . . , s^, consider the linear forms itsi • • -zt 

sn). m 

Remark that ^p is in general not injective, and thus one should distinguish the syntax 
of the values of the abstract domain (the vector of parameters p) and their semantics 
^f{p)- As an example, if one takes F to be si < pi A S2 < P2 A si + S2 < Ps, then both 
{pi,P2,P3) = (1) 1, 2) and (1, 1, 3) define the same set for state variables si and S2. li u < v 
coordinate- wise, then jf{u) C ^f{v), but the converse is not true due to the non- uniqueness 
of the syntactic form. 

Take any nonempty set of states W C Q*". Take for alii = 1, . . . ,m: pi = sup^g^y Li{s). 
Clearly, W C ^p{jpi,. . . ,pm), and in fact p is such that 7f(p) is the least solution to this 
inclusion, pi belongs in general to M U {+00}, not necessarily to Q U {+cxd}. (for instance, 
ii W = {si \ Si < 2} and Li = si, then pi = \/2)- We have therefore defined a map 
ap '■ V{M^) —7- {_L} U (M U {+c«})"', and {aF,'jF) form a Galois connection: ap maps 
any set to its best upper-approximationj^ The fixed points of ap o ^p are the normal 
forms; the normal form of x^ is the minimal abstract element that stands for the same 



See e.g. ^^ for more information on Galois connection and their use in static analysis. Not all abstract 
interpretation techniques can be expressed within Galois connections. Indeed, there are abstract domains 
where there is not necessarily a best abstraction of a set of concrete states, e.g. the domain of convex 
polyhedra, which has no best abstraction for a disc, for instance. In this article, all abstractions, except the 
non-convex ones of I 



4.3 are through Galois connections. 
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concrete set as x'^jj For instance, si < 1 A S2 < 1 A si + S2 < 2 is in normal form, while 
si < 1 A S2 < 1 A si + S2 < 3 is not. 

3.2. Optimal Abstract Transformers for Program Semantics. We shall consider the 
input-output relationships of programs with rational or real variables. We first narrow the 
problem to programs without loops and consider programs built from linear arithmetic 
assignments, linear tests, and sequential composition. Noting a,b, . . . the values of program 
variables a, b . . . at the beginning of execution and a' ,b' , . . . the output values, the semantics 
of a program P is defined as a formula IP} such that {a,b, . . . ,a' ,b' , . . .) |= P if and only if 
the memory state (a', 6', . • • ) can be reached at the end of an execution starting in memory 
state (a, 6, . . . ): 

Arithmetic: [a := L{a, 6, . . . ) + Kjp = a' = L{a, b, . . .) + K Ab' = b A c' = c A . . . 
where K is a. real (in practice, rational) constant and L is a linear form, and b,c,d... 
are all the variables except a; 
Tests: [if c then pi else P2I = (c A Ipijp) V (^c A Mf); 
Non deterministic choice: [a := random| =b' = bAc' = cA..., for all variables 

except a; 
Failure: [f ailj = false; 
Skip: [skipj =a' = aAb' = bAc^ = cA... 
Sequence: [Pi; P2IF = 3a", b" , . . . /i A/2 where /i is [PiJf where a' has been replaced 

by a", b' by b" etc., /2 is [[P2IF where a has been replaced by a" , b by b" etc. 
In addition to linear inequalities and conjunctions, such formulas contain disjunctions 
(due to tests and multiple branches) and existential quantifiers (due to sequential composi- 
tion) . 

Note that so far, we have represented the concrete denotational semantics exactly. This 
representation of the transition relation using existentially quantified formulas is evidently 
as expressive as a representation by a disjunction of convex polyhedra (the latter can be 
obtained from the former by quantifier elimination and conversion to disjunctive normal 
form), but is more compact in general. 

Consider now a domain definition formula F = Li{si,S2, ■ ■ ■) < PiA- • •AL„(si, S2, ■ ■ ■) < 
Pn on the program inputs, with parameters p and free variables s, and another F' = 
L'^{s'i, s'2, ■ ■ .) < p'l A ■ ■ ■ A L'^{s'i, S2, . . .) < p'^i on the program outputs, with parame- 
ters pi and free variables s' . Sound forward program analysis consists in deriving a safe 
post- condition from a precondition: starting from any state verifying the precondition, one 
should end up in the post-condition. Using our notations, the soundness condition is written 

ys,s' FAlPj =^ F' (3.1) 



The free variables of this relation are p and p': the formula links the value of the parameters 
of the input constraints to admissible values of the parameters for the output constraints. 
Note that this soundness condition can be written as a universally quantified formula, with 



In the terminology of some authors, these can be referred to as the reduced forms or closed forms, and 
the ap o ^p operation is a reduction or closure. For instance, in the octagon abstract domain, the closure 
ap o "fp is implemented by a variant of Floyd- Warshall shortest path [741 177| . 
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no quantifier alternation. Alternatively, it can be written as a conjunction of correctness 
conditions for each output constraint parameter: 

C[ 4 V.-, ? F^ [PI =^ L[{s') < p[. (3.2) 

Let us take a simple example: if P is the program instruction z := x + y, F = x < 
Pi Ay < P2i F' = z < p'l, then [PJ = z' = x + y, and the soundness condition is Vx, y, z' {x < 
Pi A y < p2 A z' = X + y =^ z' < p'l)- Remark that this soundness condition is equivalent 
to a formula without quantifiers p'l > Pi + P2, which may be obtained through quantifier 
elimination. Remark also that while any value for p\ fulfilling this condition is sound (for 
instance, p'l = 1000 for pi = p2 = 1), only one value is optimal {p'l = 2 for pi = p2 = I). An 
optimal value for the output parameter p[ is defined by O'^ = C[f\yq[ {C[[q[/p'j\ =^ p[ < q[). 
Again, quantifier elimination can be applied; on our simple example, it yields p'^ = pi+P2- 

If there are n input constraint parameters pi, . . . ,p„, then the optimal value for each 
output constraint parameter p^ is defined by a formula 0[ with n+1 free variables pi, . . . ,pn,Pi: 

O^ = a A VpnCl[p"M] ^P[< ft) (3.3) 



Lemma 1. The formula O'^ defined at Eq. 3.3 using the correctness subformula C^ from 



Eq. 3.2, defines p[ as the least possible value for the parameter of the constraint Li after 



executing the transition [pj from a state verifying constraints F. 

Proof. O'i explicitly defines the least possible value of C[. C[ explicitly defines all the 
acceptable values for parameter p'^ in the postcondition constraint. D 

This formula defines a partial function from Q" to Q, in the mathematical sense: for 
each choice of pi, . . . ,pn, there exist at most a single p'^. The values of pi, . . . ,pn for which 
there exists a corresponding p^ make up the domain of validity of the abstract transfer 
function. Indeed, this function is in general not defined everywhere; consider for instance 
the program: 



if (x >= 10) { y = nondeterministic_choice_in_all_reals ; } 
else { y = 0; } 



li F = X < pi and F' = y < p'^, then 0[ = pi < 10 A p'-^ = 0, and the function is defined 
only for pi < 10, with constant value 0. 

At this point, we have a characterisation of the optimal abstract transformer corre- 
sponding to a program fragment P and the input and output domain definition formulas 
as n formulas (where n is the number of output parameters) O^ each defining a function (in 
the mathematical sense) mapping the input parameters p to the output parameter p'- . 

Another example: the absolute value function y := |x|, again with the interval abstract 
domain. The semantics of the operation is(x>0Ay = x)V(2;<0Ay = —x); the 
precondition is x S [xmin,Xniax] and the post-condition is y G [ymin, I/max] • Acceptable 
values for (ymin: Z/max) are characterised by formula 

(_y — VX Xjnin j; X S 3^max '' ymin _; \X\ _; Vuiax l"'^j 

The optimal value for y^ax is defined by O' = C" A Vy^^x C"[ymax/ymax] =^ ymax < ymax- 
Quantifier elimination over this last formula gives as characterisation for the least, optimal, 
value for y^ax: 

\Xvain ~r Xjnax ^ U A ymax — ^^inaxj V (,2^min ~r Xmax \ U A ymax — ^minj' l"'"3j 
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Listing 7: Optimal transformer for ymaxj corresponding to program y = \x\ with Xmin ^ 



X 2i ^^inax 




if (xmin + xmax >= 


0) { 


ymax = xmax; 
} else { 




ymax = — xmm; 
} 





We shall see in the next sub-section that such a formula can be automatically compiled 
into code (Listing [7]) . 

3.3. Generation of the Implementation of the Abstract Domain. Consider for- 



mula 3.5 defining an abstract transfer function. On this disjunctive normal form we see 
that the function we have defined is piecewise linear: several regions of the range of the 
input parameters are distinguished (here, Xmin + a^max < and Xmin + a;max > 0), and on 
each of these regions, the output parameter j/max is a linear function of the input parame- 
ters. Given a disjunct (such as ymax = — a^min A Xmm + a^max < 0), the domain of validity 
of the disjunct can be obtained by existential quantifier elimination over the result variable 
(here 3ymax (ymax = -Xmin A Xmin + ^max < 0)). The uuion of the domains of validity 
of the disjuncts is the domain of validity of the full formula. The domains of validity of 
distinct disjuncts can overlap, but in this case, since O'^ defines a partial function in the 
mathematical sense, that is, a relation R{x, y) such that for any x there is at most one 
y such that R{x,y), the functions defined by such disjuncts coincide on their overlapping 
domains of validity. 

This suggests a first algorithm for conversion to an executable form: 

(1) Put 0[ into quantifier-free, disjunctive normal form Ci V • • • V C„. 

(2) For each disjunct Cj, obtain the validity domain Vj as a conjunction of linear in- 
equalities; this corresponds to a projection of the polyhedron defined by Cj onto 
variables pi,...,pn, parallel to p'j. Then, solve Cj for p[ (obtain p'^ as a linear 
function Vi of the pi, ■ ■ ■ ,Pn)- 

(3) Output the result as a cascade of if-then-else and assignments. 



max 
max — 



Consider now the example at the end of §3.2| and especially Formula 3.5 defining y, 

as a function of Xmin and Xmax- \XYnm~^XYasLX d. A ymax = 2;maxj V (Xmin + ^Jmax < Ay, 

— a^min)- Let us first take the first disjunct Ci = Xmin + ^Jmax > A ymax = a^max- Its validity 
domain is 3ymax Ci, that is, Xmin + a^max > 0. Furthermore, on this validity domain, we can 
solve for ymax as a function of Xmin and a^max, and we obtain ymax = a^max- We therefore 
can print out the following test: 



if (xmin + xmax >= 0) { 
jnnax = xmax; 

} 



Now take the second disjunct C2 = (a;min + a^max < A ymax = — a;min)- It can similarly 
be turned into another test, which we put in the "else" branch of the preceding one. We 
thus obtain: 
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if (xmin + xmax >= 0) { 

ymax = xmax; 
} else 
if (xmin + xmax < 0) { 

ymax = —xmax; 

} 



Observe that in the above program, the second test is redundant. If we had been more 
clever, we would have realized that in the "else" branch of the first test, Xmin + x„iax < 
always holds, and thus it is useless to test this condition. Furthermore, in an if-then-else 
cascade obtained with the above method, the same condition may have to be tested several 
times. We shall now propose an algorithm that constructs an if-then-else tree such that no 
useless tests are generated. 

Algorithm 1 ToITETREE(i^, p',r): turn a formula defining p' as a function of the other 
free variables of F into a tree of if-then-else constructs, assuming that T holds. 

D{= Ci V • • • V C„) ^ QElimDNFModulo({}, F, T) 
for all Ci G L> do 

Pi <r- QElimDNFModulo(p',F,T) 
end for 

P <— PrEDICATES(Pi, . . . , Pn) 

if P = then 

Ensure: 3p' F is always true 

O^SohVE{D,p') 
else 

K ^ Choose(P) 

O ^ lfThenElse(i^,ToITETREE(F,p',rAK),ToITETREE(F,p',r A^K)) 
end if 

The idea of the algorithm is as follows: 

• Each path in the if-then-else tree corresponds to a conjunction C of conditions (if 
one goes through the "if" branch of if (a) and the "else" branch of if (b), then 
the path corresponds to a A ^b). 

• The formula O'^ is simplified relatively to C, a process that prunes out conditions 
that are always or never satisfied when C holds. 

• If the path is deep enough, then the simplified formula becomes a conjunction. This 
conjunction, as a relation, is a partial function from the pi, . . . ,pn to the p[ we wish 
to compute. Again, we solve this conjunction to obtain p[ as an explicit function of 
the pi, . . . ,pn- For instance, in the above example, we obtain ymax as a function of 

3^min and Xmax- 

We say that two formulas A and B are equivalent, denoted hy A = B, ii they have the 
same models, and we say that they are equivalent modulo a third formula T, denoted by 
A =T B, if^AT = B AT. Intuitively, this means that we do not care what happens where 
F is false, thus the terminology "don't care set" sometimes found for -iF. 

QELiMDNFMODULO(i/, F, T) is a function that, given a possibly empty vector of vari- 
ables V, a formula F and a formula T, outputs a quantifier-free formula F' in disjunctive 
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normal form such that F' =t 3t7 F and no useless predicates are used. In [81], we have 
presented such a function as a variant of quantifier elimination. 

We need some more auxiliary functions. PREDiCATES(i^) returns the set of atomic 
predicates of F. Solve(C, p') solves a conjunction of inequalities C for variable p' . If C 
does not contain redundant inequalities, then it is sufficient to look for inequalities of the 
form p' > L or p' < L and output p' = Lr\ Finally Choose(P) chooses any predicate in 
the set of predicates P; one good heuristic seems to be to choose the most frequent atomic 
predicate where p'^ does not occur. 



Let us take, as a simple example, Formula 3.5, We wish to obtain ymax as a function of 



Xmin and Xmax; SO in the algorithm ToITEtree we set p' = 2/max- C*! is the first disjunct 

3^min ~r 2^max ^ U A t/niax — ^max: ^2 IS tne SeCOnCl CllSJUnCt Xmin + 3^max < U A ymax — ^rnin- 

We project Ci and C2 parallel to ymax, obtaining respectively -Pi = (xmm + a^max > 0) and 
-P2 = (a^min + 2;max < 0). We choose K to be the predicate Xmin + a^max > (in this case, 
the choice does not matter, since Pi and P2 are the negation of each other). 

• The first recursive call to ToITEtree is made with T = (xmin + 2;max > 0). 
Obviously, F AT = (ymax = Xmax) A T and thus (3ymax-P) AT = T. 

QELiMDNFMODULO(ymax,-P, 2^) will then simply output the formula "true". It 
then suffices to solve for ymax in ymax = a^max- This yields the formula for computing 
the correct value of ymax in the cases where Xmm + a^max > 0. 

• The second recursive call is made with T = (xmin + a;max < 0. The result is 
ymax = — 2;mini the formula for computing the correct value of ymax in the cases 

where Xmin + a^max < 0. 

These two results are then reassembled into a single if-then-else statement, yielding the 



program at the end of ^3.2 



The algorithm terminates because paths of depth d in the tree of recursive calls cor- 
respond to truth assignments to d atomic predicates among those found in the domains 
of validity of the elements of the disjunctive normal form of F. Since there is only a fi- 
nite number of such predicates, d cannot exceed that number. A single predicate cannot 
be assigned truth values twice along the same path because the simplification process in 
QElimDNFModulo erases this predicate from the formula. 

This algorithm seems somewhat unnecessarily complex. It is possible that techniques 
based upon AIGs, performing simplification with respect to "don't care sets" |100j . could 
also be used. 

3.4. Least Inductive Invariants. We have so far considered programs without loops. 
The analysis of program loops, as well as proofs of correctness of programs with loops using 
Floyd-Hoare semantics, is based upon the notion of inductive invariants. A set of states / is 
deemed to be an inductive invariant for a loop if it contains the initial state and it is stable 
by the loop iteration — in other words, if it is impossible to move from a state inside I to 
a state outside / by one iteration of the loop. The intersection of all inductive invariants is 
also an inductive invariant — in fact, it is the least inductive invariant. 

Any property true over the least inductive invariant of a loop is true throughout the 
execution of that loop; for this reason, some authors call such properties invariants (without 



In practice, any library for convex polyhedra can provide a basis of the equalities implied by a polyhedron 
given by constraints, in other words a system of linear inequalities defining the afBne span of the polyhedron. 
It is therefore sufficient to take that basis and keep any equality where p' appears. 
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the "inductive" qualifier), while some other authors call invariants what we call inductive 
invariants. 

It would be interesting to be able to compute the least invariant (inductive or nonin- 
ductive) within our chosen abstract domain; in other words, compute the least element in 
our abstract domain that contains the least inductive invariant of the loop or program. Un- 
fortunately, this is in general impossible; indeed, doing so would entail solving the halting 
problem. Just take any program P, create a fresh variable, and consider the program that 
initialises x to 0, runs P, and then sets a; to 1. Clearly, the least invariant interval for this 
program and variable x is [0, 0] if P does not terminate, and [0, 1] if it does terminate. 

We thus settle for a simpler problem: find the least inductive invariant within our 
abstract domain, that is, the least element in our abstract domain that is an inductive 
invariant. Note that even on very simple examples, this can be vastly different from com- 
puting the least invariant within the abstract domain. Take for instance the simple program 
of Fig. Isl which has a couple of real variables (x, y) and rotates them by 45° at each itera- 
tion. The {x,y) couple always stays within the square [—1, 1] x [—1, 1], so this square is an 
invariant within the interval domain. Yet this square is not an inductive invariant, for it is 
not stable if rotated by 45°; in fact, the only inductive invariant within the interval domain 
is (— oo, +oo) X (— oo, +oo), which is rather uninteresting! Note that on the same figure, the 
octagon abstract domain would succeed (and produce a regular octagon centered on (0, 0)). 

3.4.1. Stability Inequalities. Consider a program fragment: while (c) { p; }. We have do- 
main definition formulas F = Li(si, . . . , Sm) < Pi A • • • AL„(si, . . . , Sm) < Pn for the precon- 
dition of the program fragment , and F' = L[{si, . . . , Sm) < p'l A ■ ■ ■ A L'^{si, . . . , Sm) < p'n' 
for the invariant. 

Define G = [cj A [[pj . G is a formula whose free variables are si, . . . , Sm, s'l, . . . , s'^ such 
that (si, . . . , Sm, s'l, . . . , s'^) 1= G if and only if the state (s'^, . . . , s'^) can be reached from 
the state (si, . . . , Sm) in exactly one iteration of the loop. A set T^ C Q™ is said to be an 
inductive invariant for the head of the loop if Vs G V7, Vs' (s, s') |= G =^ s' S W. We 
seek inductive invariants of the shape defined by F' , thus solutions for p' of the stability 
condition: 

Vs, s' F' AG =^ F'[s'/s\. (3.6) 

Not only do we want an inductive invariant, but we also want the initial states of the 
program to be included in it. The condition then becomes 

H = (Vs, F ^ F')A (Vs, s' F' AG =^ F'[s'/s\) (3.7) 

This is an invariant condition for the head A of a loop written as: 

loop : 

/* A */ 

if (! c) goto end; 

/* B */ 

loop body 

goto loop ; 
end : 
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Figure 5: The least fixed point representable in the domain (lfp{ao fo^)) is not necessarily 
the least approximation of the least fixed point (a(lfp/)) inside the abstract 
domain. For instance, if we take a program initialised by x G [—1, 1] and y = 0, 
and at each iteration, we rotate the point by 45°, the least invariant is an 8- 
point star, and its best approximation inside the abstract domain of intervals 
is the square [—1,1]^. However, this square is not an inductive invariant: no 
rectangle (product of intervals) is stable under the iterations, thus there is no 
abstract inductive invariant within the interval domain. Using the domain of 
convex polyhedra, one would obtain a regular octagon. 



Alternatively, one can consider an invariant condition for location B. The condition 
then becomes 

H.it = Vs [cl A (F V {3s~^' M W'/s, s/s'] A F' [s^'/s\ ) ) ^ F' (3.8) 

This alternate condition is very similar to the previous one (a universally quantified formula 
with no alternation, since the 3 is negated). For the sake of simplicity, we shall only discuss 
the treatment of formula H; formula //alt can be treated in the same way. 

This formula links the values of the input constraint parameters pi, ... ,pn to acceptable 
values of the invariant constraint parameters p'^, . . . , p'^, . In the same way that our soundness 
or correctness condition on abstract transformers allowed any sound post-condition, whether 
optimal or not, this formula allows any inductive invariant of the required shape as long as 
it contains the precondition, not just the least one. 

The intersection of sets defined hy p'l andp'2 is defined by inm{p'i,p'2). More generally, 
the intersection of a family of sets, unbounded yet closed under intersection, defined by 
p' £ Z is defined by uim{p' \ p' G Z}. We take for Z the set of acceptable parameters p' 
such that p' defines an inductive invariant and Vs, F =^ F'; that is, we consider only 
inductive invariants that contain the set / = {s | F} of precondition states. 
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We deduce that p[ is uniquely defined by: p' = min(3p']^, . . . ,p'j_x,p'+i, • • • ,p'„/ H) which 
can be rewritten as 

0[ ^ (Vi, • • • ,A-i,p',+i. .■■,Pn'H)A {yq' H[q'/p'] =^ p', < q'^ (3.9) 

The free variables of this formula are pi, . . . ,pn,p[. This formula defines a function (in the 
mathematical sense) defining p[ from pi, ■ ■ ■ ,Pn- As before, this function can be compiled 
to an executable version using cascades or trees of tests. 



Lemma 2. The formula 0[ defined at Eq. 3.9 defines the least value of p^ for an inductive 



invariant of the shape defined by F for the transition relation defined by G. 



Proof. Similarly as for lemma [T| the formula H defined at Eq. 3.7 defines a set Y of ad- 
missible p'l,. . ■ ,p'„, such that F = Li{si,. . . , Sm) < p'l A ■ ■ ■ A L„(si, . . . , Sm) < p'„/ is an 
inductive invariant for the loop. Formula O'^ defines p[ to be inf{p'/ | {p'(, . . . ,p'^) £ Y}. in 
other words, {p'l, . . . ,p^/) = inf y. D 

Thus the overall operation of the analysis method: 

We start from quantified formulas O^ defining the least inductive invariant of the loop 
in the abstract domain (Lemma [2]). We eliminate quantifiers from these formulas; since 
this does not change their models, the resulting formulas without quantifiers also define the 
least inductive invariant in the abstract domain. 

• Either the problem had no precondition parameters pi, . . . ,pn, and thus each for- 
mula 0[ has only one variable p[. It consists of linear (in)equalities, and has a single 
model for p'^, which is straightforward to extract. Collect these values, one obtains 
the values defining the least invariant (p'^, . . . ,p'^i) in the abstract domain. 

• Either the problem has precondition parameters and one employs one of the methods 
in §3.3| to obtain equivalent executable code. 

The overall correctness of the method is quite tautological. We start from a non- 
executable specification of the least inductive invariant in the abstract domain in the form 
of quantified formulas. We eliminate the quantifiers from these formulas, then process them 
into equivalent executable code. At all steps, we have preserved logical equivalence with the 
original definition. In short, we have synthetized the implementation of the best transformer 
from its specification. 

3.4.2. Simple Loop Example. To show how the method operates in practice, let us consider 
first a very simple example (nondet() is a nondeterministic choice): 

int i=0; 

while ( i <= n) { 
if (nondet()) { 
i = i-i-l; 
if (i == n) { 

i=0; 
} 
} 
} 
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Let US abstract i at the head of the loop using an interval [imm,imax]- For simplicity, 
we consider the case where the loop is at least entered once, and thus i = belongs to 
the invariant. For better precision, we model each comparison x ^ y over the integers as 
X >= y + \y X <= y — 1; similar transformations apply for other operators. The formula 
expressing that such an interval is an inductive invariant is: 

^min < A < finax A ViVi' {{imm <i M< imaxA 

(((i + l<n-lVi + l>n+l)Ai' = i + l)V 

(i + 1 = n + 1 A i' = 0) V i' = i)) =^ {i^in <i' M' < imax)) (3.10) 
Quantifier elimination produces: 

(«mm < A imax > A -Jmax < n A -imm + n - 2 < 0)V 

(«mm < A Zmax > A «max - n + 1 > A imax < n) (3.11) 

The formulas defining optimal imm and imax are: 

^min > A imm < A n > (3.12) 

(imax = 0An>0An<2)V (i^ax = n - 1 A i^ax > 1) (3.13) 

We note that this invariant is only valid for n > 0, which is unsurprising given that we 
specifically looked for invariants containing the precondition i = 0. The output abstract 
transfer function is therefore: 



if (n <= 0) { 

failO; 
} else { 

iMin = 0; 




if (n < 2) { 
iMax = 0; 




} else /* n >= 


2 */ 


iMax = n-1; 

} 
} 





The case disjunction n < 2 looks unnecessary, but is a side effect of the use of rational 
numbers to model a problem over the integers. The resulting abstract transfer function is 
optimal, but on such a simple case, one could have obtained the same using polyhedra [36] 
or octagons [73] . 

We have already noted ( ^2.1[ ) that even with we replace n by the constant 10, the 
classical widening/narrowing approach will fail to identify the least invariant of this loop, 
and that extra techniques such have widening with thresholds have to be used. 

3.4.3. Synchronous Data Flow Example: Rate Limiter. To go back to the original problem 
of floating-point data in data-flow languages, let us consider the following library block: a 
rate limiter. As seen in Listing [8| such a block in inserted in a reactive loop, as shown 
below, where assume(c) stands for if (c) {} else { fail ();} and fail () aborts execution. 

This block has three input streams el, e2, and e3, and one output stream si. In 
intuitive terms, si is the same as el but where the maximal slope of the signal between two 
successive clock ticks is bounded by e2, thus the name rate limiter. At some points in time 
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Listing 8: Rate liniiter. 



while (true) { 

el = random ; assume (el >= elmin && el <= elmax); 
e2 = random ; assume (e2 >= e2min && e2 <= e2max ) ; 
e3 = random (); assume (e3 >= eSmin && e3 <= e3max); 
oldsl = si ; 
if (nondetO) { 

si = e3 ; 
} else { 

if (el - oldsl < -e2) { 
si = oldsl — e2 ; 

} 

if (el - oldsl > e2) { 

si = oldsl + e2 ; 

} 
} 



Listing 9: If then else tree 



if 


(elmax 


> e3max) 


{ 




slmax = 


elmax; 




I 


else { 








slmax = 


e3max ; 




} 









(modelled by nondeterministic choice), the value of the signal is reset to that of the third 
input e3. 

We are interested in the input-output behaviour of that block: obtain bounds on the 
output si of the system as functions of bounds on the inputs (el, e2, e3). One difficulty 
is that the si output is memorised, so as to be used as an input to the next computation 
step. The semantics of such a block is therefore expressed as a fixed point. 

We wish to know the least inductive invariant of the form siy^m < si < simax under 
the assumption that eimin < eimax ^ e2min ^ e2max ^ esjnin < csmax- The stability condition 
yields, after quantifier elimination and projection on simax of the condition simax ^ eimax^^ 
■Slmax ^ ^SmsLx- Minimisation then yields an expression that can be compiled to an if-then- 
else tree (Listing [9]) . 

This result, automatically obtained, coincides with the intuition that a rate limiter (at 
least, one implemented with exact arithmetic) should not change the range of the signal 
that it processes. 

This program fragment has a rather more complex behaviour if all variables and op- 
erations are IEEE-754 floating-point, since rounding errors introduce slight differences of 
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regimes between ranges of inputs. Rounding errors in the program to be analysed introduce 
difficulties for analyses using widenings, since invariant candidates are likely to be "almost 
stable" , but not truly stable, because of these errors. Again, there exist workarounds so 



that widening-based approaches can still operate [,15ij Sec. 7.1.4]. We shall see in ^4.5 how 
to correctly abstract floating-point behaviour within our framework; unfortunately, the for- 
mulas produced tend to be very large due to many case disjunctions. The implementation 
of the abstract transformer produced for the above rate limiter in floating-point does not 
fit within one page of article, this is why we omitted it. 

4. Extensions of the framework using real linear arithmetic 

We shall describe here a few extensions to the class of programs and domains that we 
can handle, all of which are still based on quantifier elimination over real linear arithmetic. 
(In qTl we shall investigate extensions using other arithmetic theories.) 

4.1. Emptiness. We have so far supposed that the statement where the inductive invariant 
is computed is reachable, and thus that there exists some nonempty set of initial states 
that constrain the inductive invariant from below. More generally, and especially for the 



constructs described in ^4.4 and ^5.1, it may be necessary to encode the bottom element _L 
of the abstract domain, which represents the empty set of states. This can be done using 
one Boolean variable per element that might be empty: instead of template parameters 
(pi, . . . ,Pn), we will have {b,pi, . . . ,Pn), with the semantics that 7(false,|?i, . . . ,pn) = and 
7(true,pi,. . .,pn) = {s I Vi Lj(s) <pi}. 

Sankaranarayanan et al. [99j use pi = — oo for this, but in our framework, infinities 
themselves have to be encoded using Booleans, as we'll see in the next subsection. Further- 
more, if we have an abstract element in normal form with a constraint Li{vi, . . .) < — oo, 
it means that all pj are — oo and thus it is sufficient to have a single Boolean variable for 
all of them. 



4.2. Infinities. The techniques explained in Sec. 3.1 allow only finite bounds. Consider for 
instance the following program: 



X 


= 






w 


hile 


(true) 


{ 




X = 


x + 1; 




} 









We would like to obtain as a result of the analysis that x lies in [0, oo). Yet, what will 
happen is that the formula describing the couples (xmin;a;max) defining inductive invariants 
a^min < X < Xmax will be simplified to false. 

More annoyingly, with the following program: 

x = y; 

while (true) { 
X = x + 1; 
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we would at least hope to infer that Xmin = IJmm- Yet, if we look for an invariant of the 



form Xmin < X < Xmaxj there is no solution for any value of (ymim ymax)! In { 7.1 we shall see 
an example where it is actually interesting to infer the domain of values of the precondition 
where the least inductive invariant interval is finite, and that this domain can simply be 
obtained by existential quantification on the parameters of the inductive invariant followed 
by elimination. But in general, this is not what we want; instead, we would prefer to allow 
infinite values in the p and p' . 

This can be easily achieved by a minor alteration to our definitions. Each parameter 
Pi is replaced by two parameters p^ and pf^. p°° is constrained to be in {0,1} (if the 
quantifier elimination procedure in use allows Boolean variables, then pf can be taken as a 
Boolean variable); pf = means that pi is finite and equal to p\, pf = 1 means pi = +oo. 
Li < Pi becomes {pf > 0) V {Li < p\), Li < pi becomes {pf > 0) V {Li < p\). After this 
rewriting, all formulas are formulas of the theory of linear inequalities without infinities and 
are amenable to the appropriate algorithms. 

Unfortunately, the added combinatorial complexity induced by these Boolean variables 
tends to lead to intolerable computation times in the quantifier elimination procedures. 
Further work is needed, probably in the direction of better quantifier elimination procedures 
for combinations of Boolean and real quantified variables. Alternatively, one can envision 
directly including reasoning about infinities inside these procedures, though this is of course 
a delicate matter because of the possibility of generation of indeterminate forms oo — oo if 
formulas are handled without special care. 



4.3. Non-Convex Domains. Section |3.1| constrains formulas to be conjunctions of in- 
equalities of the form Li < Pi. What happens if we consider formulas that may contain 
disjunctions? 



The template linear constraint domains of section 3.1 have a very important property: 
they are closed under (infinite) intersection; that is, if we have a family p £ W, then there 
exists pq such that C\pew^F'{'P) = "yF{po) (besides, po = mi{p \ p G W}). This is what 
enables us to request the least element that contains the exact post-condition, or the least 
inductive invariant in the domain: we take the intersection of all acceptable elements. 

Yet, if we allow non-convex domains, there does not necessarily exist a least element 
^f{'P) such that S C ^f{'P)- Consider for instance S = {0, 1,2} and F representing unions 
of two intervals ((— x < pi A x < P2) V {—x < ps A x < P4)) A p2 < —P3- There are two, 
incomparable, minimal elements of the form Jf{p)'- Pi = P2 = A p^ = —1 A p4 = 2 and 
pi = 0Ap2 = lAp3 = -2Ap4 = 2. 

We consider formulas F built out of linear inequalities Lj(si, . . . ,s„) < pi as atoms, 
conjunctions, and disjunctions. By induction on the structure of F, we can show that 
7ir : (MU {—00})"' — )• V{W^) is inf-continuous; that is, for any descending chain {pi)i<^i such 
that \\in.iPi = Poo, then ^F{'Pi) is decreasing and r\i^ilF{'Pi) = 7f{Poo)- The property is 
trivial for atomic formulas, and is conserved by greatest lower bounds (A) as well as binary 
least upper bounds (V). 

Let us consider a set S C 'P(M"'), stable under arbitrary intersection (or at least, 
greatest lower bounds of descending chains). S can be for instance the set of invariants of a 
relation, or the set of over-approximations of a set W. jp {S) is the set of suitable domain 
parameters; for instance, it is the set of parameters representing inductive invariants of 
the shape specified by F, or the set of representable over-approximations of W. 7^ {S) is 
stable under greatest lower bounds of descending chains: take a descending chain {pi)i£i, 
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Figure 6: The state space is partitioned into x < and x > 0, and on each element of 
the partition we have a product of intervals, respectively [—5,-2] x [4,7] and 
[1,7] X [0,5]. Without the disjunction, we would have had to consider the much 
larger set [-5,7] x [0,7]. 



then 'ypQ.iuiiPi) = Di'yFiPi) G 5* by inf-continuity and stability of S. By Zorn's lemma, 
jp {S) has at least one minimal element. 

Let P\p\ be a formula representing 7^ (5) (Sec. 



3.1 



proposes formulas defining safe post- 
conditions and inductive invariants). The formula G[p\ = P[p]AVp' P[p']Ap' < p =^ p ^ p' 
defines the minimal elements of 7~^(S'). 

For instance, consider p = {a,b,c,d), F = {—x < a A x < b) V {—x < c A x < d), 
representing unions of two intervals [—a, b] U [— c, d]. We want upper-approximations of the 
set {0, 1, 3}; that is P[p\ =Vx(x = 0Vx = lVx = 3 =^ F[p, x]). We add the constraint 
that —a < b Ab < —c A —c < d, so as not to obtain the same solutions twice (by exchange 
of (a, 6) and (c, d)) or solutions with empty intervals. By quantifier elimination over G, we 
obtain (a = 0A6 = lAc = -3Ad = 3)V(a = 0A6 = 0Ac = -lAd = 3), that is, either 
[0,0]U[1,3] or [0, 1]U[3,3]. 



4.4. Domain Partitioning. Non-convex domains, in general, are not stable under inter- 
sections and thus "best abstraction" problems admit multiple solutions as minimal elements 
of the set of correct abstractions. As explained in the preceding subsection, is not very sat- 
isfactory nor efficient for analysis. There are, however, non-convex abstract domains that 
are stable under intersection and thus admit least elements as well as the template linear 
constraint domains of Sec. |3.1[ those defined by partitioning of the state space. 
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If, for instance, we know that a program or system behaves differently according to the 
sign of X, then we can decide in advance to partition the state space into x > and x < 0. 
On each element of the partition, we can have a separate template (example Fig. pi) . We 
can accommodate this within our framework. 

More formally, consider pairwise disjoint subsets (Ci)ig/ of the state space Q™, and 
abstract domains stable under intersection (S'j)jg/, Si C V{Ci). Elements of the partitioned 
abstract domain are unions Uje/ ■^i where Sj S Si. If (Uj^ij]) gj is a family of elements 

of the domain, then fljej (UiG/'^i.j) = UieiCljej ^i,ji that is, intersections are taken sepa- 
rately in each Cj. in other words, the parameters of the templates on each element of the 
partition can be dealt with independently of each other. 

Note the difference with the general disjunctive domains of §4.3[ in the general dis- 
junctive domains, there is no partition fixed a priori, this is why we may have several 
incomparable minimal elements [0, 0] U [1, 2] and [0, 1] U [2, 2] in the domain of disjunctions 
of two intervals representing the same set {0, 1,2}. For a fixed partition Cj and correspond- 
ing domains Si, for any set X, for every i, there is a least element representing X Ci in 
domain Si. This motivates the following construct: 

Take a family {Fi[p\)i^j of formulas defining template linear constraint domains (con- 
junctions of linear inequalities Li{si, . . . ,Sn) < Pi) and a family {Ci)i^i of formulas such 
that for all i and i' , Ci A Cj' is equivalent to false and Ci V • • • V C^ is equivalent to trus. 
F = (Ci AFi) V- • ■\/{Cif\Fi) then defines an abstract domain such that ^p is a inf-morphism. 
All the techniques of §3.1|then apply. 



For instance, by choosing Ci = x > 0, C2 = x < 0, Fi = x]^^^ < x < x^^^ A y]^^^ < x < 
ymax> and F2 = x^i^ < X < x^ax A y^i„ < x < y^^x; we can obtain Figure 



6 
The above constructions are equivalent to assigning a separate contro 



point to each 



element in the partition, with guards leading to these points according to the Ci, and then 



performing as described in ^5.1 



4.5. Floating-Point Computations. Real-life programs do not operate on real numbers; 
they operate on fixed-point or floating-point numbers. Floating point operations have few 
of the good algebraic properties of real operations; yet, they constitute approximations of 
these real operations, and the rounding error introduced can be bounded. 

In IEEE floating-point [61], each atomic operation (noting ©, 0, (gi, 0, ^ for operations 

so as to distinguish them from the operations +, — , x, / , / over the reals) is mathematically 

deflned as the image of the exact operation over the reals by a rounding function|_| This 
rounding function, depending on user choice, maps each real x to the nearest floating-point 
value r„(x) (round to nearest mode, with some resolution mechanism for non representable 
values exactly in the middle of two floating-point values), r_oo(2;) the greatest floating-point 
value less or equal to x (round toward —00), rj^oo(x) the least floating-point value greater 
or equal to x (round toward +00), ro(x) the floating-point value of the same sign as x but 
whose magnitude is the greatest floating-point value less or equal to |x| (round toward 0). 
If X is too large to be representable, r(x) = ±00 depending on the size of x 

We leave aside the peculiarities of some implementations, such as those of most C compilers over the 
32-bit Intel platform where there are "extended precisions" types used for some temporary variables and 
expressions can undergo double rounding [84] , 
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The semantics of the rounding operation cannot be exactly represented inside the the- 
ory of hnear inequahtiesj | As a consequence, we are forced to use an axiomatic over- 
approximation of that semantics: a formula linking a real number x to its rounded ver- 
sion r(x). 

Mine |75] uses an inequality \r{x) — x| < e^ei • \x\ + Sabsi where e^ei is a relative error 
and Eabs is an absolute error, leaving aside the problem of overflows. The relative error is 
due to rounding at the last binary digit of the significand, while the absolute error is due 
to the fact that the range of exponents is finite and thus that there exists a least positive 
floating-point number and some nonzero values get rounded to zero instead of incurring a 
relative error (or get rounded to a denormal, see below). 

Because our language for axioms is richer than the interval linear forms used by Mine, 
we can express more precise properties of floating-point rounding. We recall briefly the 
characteristics of IEEE-754 floating-point numbers. Nonzero floating point numbers are 
represented as follows: x = ib2^m where 1 < ?7i < 2 is the mantissa or significand, which has 
a fixed number p of bits, and e (i^min < e < -Bmax is the exponent). The difference introduced 
by changing the last binary digit of the mantissa is its.eiast where eiast = 2"'^"-^^: the unit 
in the last place or ulp. Such a decomposition is unique for a given number if we impose that 
the leftmost digit of the mantissa is 1 — this is called a normalised representation. Except 
in the case of numbers of very small magnitude, IEEE-754 always works with normalised 
representations. There exists a least positive normalised number ^normal and a least positive 
denormalized number mdenormah and the denormals are the multiples of widenormai less than 
rrinormai- All representablc numbers are multiples of rndenormai- 

We shall now attempt to define an imprecise axiomatisation of the relationship between 
the result of a floating-point operation and the result of the corresponding ideal operation 
over real numbers. For this, we shall distinguish operations plus and minus on the one 
hand, multiplication and division on the other hand, since the former have properties of 
which we can take advantage. 

Let us consider addition or subtraction x = ±a ± b. Suppose that < x < rn-normai- 
a and b are multiples of mdenormai and thus a — 6 is exactly represented as a denormalized 
number; therefore r{x) = x. If x > mnormah then \r{x) — x\ < Srei-X. In other words, if the 
result of an addition or subtraction is denormal, then it is exact j^ The cases for a; < are 
symmetrical. 

We therefore obtain the following axiomatisation of the rounding of a positive real 
number x, result of a floating-point addition or subtraction, to a floating-point value r, 
using round-to- nearest: 

Round^{r, x) = {x < m-normal A r = x) V (x > mnormal A -Erel-X < r - X < ^rel-a;) (4.1) 



To be pedantic, since IEEE floating-point formats are of a finite size, tfie rounding operation could be 
exactly represented by enumeration of all possible cases; this would anyway be impossible in practice due 
to the enormous size of such an enumeration. 

William Kahan has written extensively about the advantages of the existence and proper handling of 
denormals, otherwise known as gradual underflow, as opposed to flushing to zero all numbers too small to be 
approximated by normal numbers, a process known as flush to zero. For instance, with gradual underflow, 
a 6 = is equivalent to a = 6, quite a desirable property. See e.g. [63i p. 6]. Unfortunately, certain 
architectures do not implement gradual underflow, for the sake of efficient; then one has to use the Round 
and not the Round predicate. 
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We then use this predicate to construct an axiomatisation for rounding of numbers of any 
sign: 

Roun(f^{r,x) = (x = OAr = 0)V(x>OA Round^{r,x))\/ 

{x < A Round^i-r, -x)) (4.2) 
Equivalently, this last formula can be simplified into the equivalent: 

Round^{r, x) = (-mnormal <X< mnormal A X = r) 

V (x > mnormal A x(l - Erd) < r < x(l + Ercl)) 

V (x < -mnormal A x(l + Ercl) < r < x(l - Erd)) (4.3) 

Consider now multiplication or division x = a0b or x = a0b. Here, we cannot assume 
that if the result is denormal, then it is exact. A correct axiomatization of rounding for 
positive numbers is: 

Roundj^{r, x) = (x < mnormal A r > A X - Eabs < r < X + ffabs) 

V (x > mnormal A Srel-X <r - X < Erel-X) (4.4) 

where Sabs = m,denormal/2. Now for rounding for any sign: 

Round{r, x) = (x = A r = 0) V (x > A Round^{r, x))V 

(x < A Round+{-r, -x)) (4.5) 

To each floating-point expression e, we associate a "rounded-off" variable re, the value of 
which we constrain using Roundr {Vf., e) or Round{r(., e). For instance, a expression e = a®b 
is replaced by a variable re, and the constraint Round^ {r^, a + b) is added to the semantics. 
In the case of a compound expression e = ab + c, we introduce ei = ab, and we obtain 
Roundr{r(,,rei + c) A Round{r(.^,ab). If we know that the compiler uses a fused multiply- 
add operator, we can use Round{re, ab + c) instead. 

The drawbacks of such axiomatization of floating-point operations are that they in- 
troduce case disjunctions into formulas, leading to extra work for quantifier elimination 
procedures, and also that they make very unnatural coefficients appear — that is, ratio- 
nal numbers with large numerator and denominator, such as 1 + Erel or 1 — e^el, or very 
large integers such as 1 /mnormal- To go back to our very simple rate limiter running exam- 
ple ( ^3.4.3 ), analysis times are multiplied by 12 if one stops assuming that floating-point 



variables behave like reals, and instead uses some axiomatization j81l p. 9]. Furthermore, 
the formula produced is very large and hardly readable, doing many case disjunctions. 
Perhaps it is possible to simplify such large formulas by allowing some limited overap- 
proximation — for instance, we can replace the function mapping p to p' as defined by 
{0 < p < 1 < p' = p)V {p > I < p' = {l+e)p) by the function defined by < pAp' = {l+e)p, 
since the latter formula always gives a solution for p' greater or equal to that of the former. 
Even better, such simplifications could be performed during the elimination procedure. 
Further investigations are needed in that respect. 
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4.6. Integers. We have mentioned in §2. 2. 2| that Presburger arithmetic admits quantifier 
ehmination. We therefore could apply quantifier elimination in Presburger arithmetic, sim- 
ilarly as we do with respect to real linear arithmetic. Unfortunately, as we shall see in 
§7.1[ such an approach suffers from explosion in the size of the formulas and the cost of the 
algorithms. 

Instead, we used a relaxation approach: all integers are treated as reals; strict inequal- 
ities a < b where both sides are integers are recoded a < 6—1. For instance, if the program 
contains a if-then-else over x < y, then x < y is a precondition for the "then" branch, 
and X > y + 1 is a precondition of the "else" branch. Note that this means that traces of 
execution such that y < x < y + 1 are considered to fail. 



In some cases, such as the McCarthy 91 function example from ^5.2, it is necessary 
to constraint the reasoning procedures so that they consider that the negation of a < 6 is 
a >b + 1. We hope that improvements of quantifier elimination algorithms will be able to 
allow a more elegant approach. 

Another issue is that in many programming languages, integers are bounded and that 
arithmetic operations are actually performed modulo 2" (with n typically 8, 16, 32 or 64). 
The problem then lies within an enormous, but finite, state space. Clever techniques for 
reasoning about bit-vector arithmetic are being investigated by the SMT-solving community. 
Again, we hope that future work will provide good quantifier elimination techniques for this 
arithmetic, or combinations thereof with the linear theory of reals. 



4.7. Nonlinear constructs. In ^7.2 we explain how to fully deal with nonlinear program 
constructs and templates, at the expense of very high computational complexity. 

A more practical approach is to linearise the program expressions |78j |76| ch. 6]. If we 
encounter an assignment z := x * y and we know, perhaps through rough interval analysis, 
an interval y € [ymim ymax]i we can use the following abstraction of the semantics of this 
assignment: 

(x > A Xymin < Z < xymax) V (x < A Xymax < z' < Xymin) (4.6) 



If we know intervals for both x and y, then we can apply Eq. 4.6 to both (x, [ymin, ymax]) 



and {y, [xmin,a^max]), and take the conjunction of the resulting formulas. 

5. Complex control flow 

We have so far assumed no procedure call, and at most one single loop. We shall see 
here how to deal with arbitrary control fiow graphs and call graph structures. 



5.1. Arbitrary control graph and loop nests. In Sec. |3.4[ we have explained how to 
abstract a single fixed point. The method can be applied to multiple nested fixed points by 
replacing the inner fixed point by its abstraction. For instance, assume the rate limiter of 



Sec. 3.4.3 is placed inside a larger loop. One may replace it by its abstraction: 



if (elmax > eSmax) { 
slmax = elmax; 

} else { 

slmax = e3max; 

} 

assume (si <= slmax); 
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Listing 10: Loop nest 



1=0; 

while (true) { /* A */ 
if ( choice () ) { 
j=0; 

while ( j < i ) { /* B */ 
/* something */ 

} 

i = i +1; 

if (i==20) { 
i=0; 

} 
} else { 

/* something */ 

} 
} 



/* and similar for slmin */ 



Alternatively, we can extend our framework to an arbitrary control flow graph with 
nested loops, the semantics of which is expressed as a single fixed point. We may use the 
same method as proposed by Gulwani et al. \56\ §2] and other authors. First, a cut set 
of program locations is identified; any cycle in the control flow graph must go through at 
least one program point in the cut set. In widening-based fixed point approximations, one 
classically applies widening at each point in the cut set. A simple method for choosing a 
cut set is to include all targets of back edges in a depth-first traversal of the control-fiow 
graph, starting from the start node; in the case of structured program, this amounts to 
choosing the head node of each loop. This is not necessarily the best choice with respect to 
precision, though [561 §2-3]; Bourdoncle [IHl Sec. 3.6] discusses methods for choosing such 
as cut-set. 

To each point in the cut set we associate an element in the abstract domain, parame- 
terised by a number of variables. The values of these variables for all points in the cut-set 
define an invariant candidate. Since paths between elements of the cut sets cannot contain 
a cycle, their denotational semantics can be expressed simply by an existentially quantified 
formula. Possible paths between each source and destination elements in the cut-set de- 



fined a stability condition (Formula 3.6). The conjunction of all these stability conditions 



defines acceptable inductive invariants. As above, the least inductive invariant is obtained 



by writing a minimisation formula (Sec. 3.4). 



Let us consider the loop nest in Listing 10 We choose program points A and B as cut- 



set. At program point A, we look for an invariant of the form lA{i,j) = imm,A < i < ^max,A, 
and at program point B, for an invariant of the form lB{i,j) = iram,B < ^ < ima.x,B A jmin < 
j < jmax A 6mm 1^ i — j ^ (^max (& difference-bound invariant). The (somewhat edited for 
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brevity) stability formula is written: 

Vi Ia{0, j) A VfVi {{lB{i, j)/\j>i/\{i + l< 19V 

i + 1 = 20Vi + l > 21)) ^U[i + 1 = 20,/A(0,i),/A(« + l,j)])A 

ViVj {lAihj) ^ /ij(i,0)) AViVi ((/B(^,i) A j < i) 

^lB{i,j + l)) (5.1) 

where If[6, 61,62] = (6 A 61) V (-16 A 62). 

Replacing I^i and /^ into this formula, then applying quantifier elimination, we obtain 

a formula defining all acceptable tuples (imin,yl,^max,A,^mm,B, imax,B, jmin, jmax,^min, (^max)- 

Optimal values are then obtained by further quantifier elimination: imm,A = imm,B = Jmin = 

U, ^niax,A ^ ^maXjB ^ -Ly, Jmax ^ -^U, Omin ^ J-j "max ^ -Ly. 



The same example can be solved by replacing 20 by another variable n as in Sec. 3.4.2 



5.2. Procedures and Recursive Procedures. We have so far considered abstractions 
of program blocks with respect to sets of program states. A program block is considered 
as a transformer from a state of input program states to the corresponding set of output 
program states. The analysis outputs a sound and optimal (in a certain way) abstract 
transformer, mapping an abstract set of input states to an abstract set of output states. 

Assuming there are no recursive procedures, procedure calls can be easily dealt with. 
We can simply inline the procedure at the point of call, as done in e.g. Astree [T HITSllST] . 
Because inlining the concrete procedure may lead to code blowup, we may also inline its 
abstraction, considered as a nondeterministic program. Consider a complex procedure P 
with input variable x and output variable x. We abstract the procedure automatically with 
respect to the interval domain for the postcondition (^mm < z < ^max); suppose we obtain 
-^max := 1000; Zmin '■= X then we can replace the function call by z <= 1000 && z >= x. This 
is a form of modular interprocedural analysis: considering the call graph, we can abstract the 
leaf procedures, then those calling the leaf procedures and so on. We can also do likewise for 
nested loops: abstract the innermost loop, then the next innermost one, etc. This method 
is however insufficient for dealing with recursive procedures. 

In order to analyse recursive procedures, we need to abstract not sets of states, but sets 
of pairs of states, expressing the input-output relationships of procedures. In the case of 
recursive procedures, these relationships are the least solution of a system of equations. 

To take a concrete example, let us consider McCarthy's famous "91 function" |70U71j. 
which, non-obviously, returns 91 for all inputs less than 101: 



int M(int n) { 
if (n > 100) { 
return n — 10; 
} else { 

return M(M(n + ll)); 

} 
} 
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The concrete semantics of that function is a relationship R between its input n and its 
output r. It is the least solution of 

R^{{n,r)&I?\ (n > 100 A r = n - 10)V 

(n < 100 A 3n2 G Z(n + 11, ns) G i? A (ns, r) G R)} (5.2) 

We look for a inductive invariant of the form I = [{n > A) f\ [r — n > 5) f\ {r — n < 



A)) V ((n < B) f\ {r = C)), a non-convex domain (Sec. 4.3). By replacing R hy I into 
inclusion |5.2[ and by universal quantification over n,r,n2, we obtain the set of admissible 
parameters for invariants of this shape. By quantifier elimination, we obtain (C = 91) A ((5 = 
A = —10)A{A = 101) A(i? = 100) within a fraction of a second using MjOLLNiR (see Sec.[6|. 
In this case, there is a single acceptable inductive invariant of the suggested shape. In 



general, there may be parameters to optimise, as explained in Sec. 3.4 The result of this 
analysis is therefore a map from parameters defining sets of states to parameters defining 
sets of pairs of states (the abstraction of a transition relation). This abstract transition 
relation (a subset of X x y where X and Y are the input and output stat e set s) can be 



transformed into an abstract transformer in X' — t- y" as explained in Sec. 3.2 Such an 
interprocedural analysis may also be used to enhance the analysis of loops |72| . 

6. Implementations and Experiments 

We have implemented the techniques of Sec. [3] in quantifier elimination packages, in- 
cluding MathematicaJ^ and Reduce 3.^+ REDLOcj^in addition to our own package, 
Mjollnir [MJrj We ignore which exact techniques are implemented within Mathemat- 
ICA. IJ Redlog appears to implement some virtual substitution method |44l I107J . 

As test cases, we took a library of operators for synchronous programming, having 
streams of floating-point values as input and outputs. These operators are written in a 
restricted subset of C and take as much as 20 lines. A front-end based on CIL [85J con- 
verts them into formulas, then these formulas are processed and the corresponding abstract 
transfer functions are pretty-printed. Since for our application, it is important to bound 
numerical quantities, we chose the interval domain. 

Among the extensions in q4l we implemented those relevant to floating-point (^4.5|) 



and integers {{ 
functions (^5.2 



4.6), and did more manual experiments with infinities (|4.2[) and recursive 



For instance, the rate limiter presented in Sec. 3.4.3 was extracted from that library. 
Since this operator includes a memory (a variable whose value is retained from a call to 
the operator to the next one), its data-flow semantics is expressed using a fixed-point. 
When considered with real variables, the resulting expanded formula was approximately 



1 3 

Mathematica is a commercial computer algebra package available under an unfree license from 



Wol- 



|fram Research |108j . 

REDUCEjis a computer algebra package from Anthony C. Hearn, now available under a modified BSD 



licence. 



Redlog is an extension to Reduce for working over quantified formulas. 

Mjollnir is available under a free license from the author's home page In addition to the author's 



own quantifier elimination techniques, it implements Ferrante and Rackoff and Loos and Weispfenning 



L! 



Loos and Weispfenning 's quantifier elimination procedure is used by Mathematica to perform simpli- 



fications over linear inequalities 108. §A.9.5], but we are unsure whether this is the algorithm called by the 
Reduce function. 
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1000 characters long, and with floating point variables approximately 8000 characters long. 
Despite the length of these formulas, they can be processed by Mjollnir in a matter of 
seconds. The result can then be saved once and for all. 

Analysers such as Astree O [IS |37] must have special knowledge about such oper- 
ators, otherwise the analysis results are too coarse (for instance, the intervals do not get 
stabilized at all). The Astree development team therefore had to provide specialized, 
hand- written analyses for certain operators. In contrast, all linear floating-point operators 
in the library were analysed within a fraction of a second using the method in the present 
article, assuming that floating-point values in the source code were real numbers. If one 
considered instead the abstraction of floating-point computations using real numbers from 



Sec. |4.5[ computation times did not exceed 17 seconds per operator; the formulas produced 
are considerably more complex than in the real case. Note that this computation is done 
once and for all for each operator; a static analyser can therefore cache this information for 
further use and need not recompute abstractions for library functions or operators unless 
these functions are updated. 

Our analyser front-end currently cannot deal with non-numerical operations and data 
structures (pointers, records, and arrays). It is therefore not yet capable of directly dealing 
with the real control programs that e.g. AsTREE accepts, which do not consist purely of 
numerical operators. We plan to integrate our analysis method into a more generic analyser. 
Alternatively, we plan to adapt a front-end for synchronous programming languages such 
as SiMULiNK, a tool widely used by control/command engineers. 

The correctness of the methods described in this article does not rely on any particular- 
ity of the quantifier elimination procedure used, provided one also has symbolic computation 
procedures for e.g. putting formulas in disjunctive normal form and simplifying them. The 
difference between the various quantifier elimination and simplification procedures is effi- 
ciency; experiments showed that ours was vastly more efficient than the others tested for 
this kind of application. For instance, our implementation of our quantifier elimination al- 



gorithm |8T] was able to complete the analysis of the rate limiter of Sec. 3.4.3, implemented 
over the reals, in 1.4 s, and in 17 s with the same example over floating-point numbers, while 
Redlog took 182 s for the former and could not finish the latter, and Mathematica could 
analyse neither (out-of-memory) . On other examples, our quantifier elimination procedure 
is faster than the other ones, or can complete eliminations that the others cannot. 

7. Extensions to other numerical domains 

We have so far concerned ourselves solely with real (and possibly Boolean) variables 
appearing in linear arithmetic formulas. In Q we have seen how to reason over certain other 
data types (integers, floating-point values), but again modeling them as real numbers. Yet, 
in §2.2[ we pointed out that linear real arithmetic is not the only arithmetic theory with 
quantifier elimination algorithms; other well-known examples include Presburger arithmetic 
(linear integer arithmetic) and the theory of real closed fields (that is, polynomial real 
arithmetic). 

In this section, we report on using quantifier elimination in both these theories for 
computing optimal transformers or fixed points. Unfortunately, experiments have shown 
that the high complexity of the quantifier elimination algorithms for these theories and the 
lack of simplifications for the formulas they produce preclude their use in practice. The 
results in this section are thus mainly of theoretical interest. 
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7.1. Presburger arithmetic. The approach from ^4.6 relaxes integers to reals. It there- 
fore yields correct results, but might lead to overapproximations that could be avoided if 
integers were used instead. What if we used quantifier elimination on Presburger arithmetic 
instead? 

Consider the following simple example: 



int a, m; 

i = 0; 

while ( i < m) { 

i = i+a; 
} 

Let us abstract the loop variable a using an interval [l,h]. These bounds must satisfy 
I = I <0 < h (initialisation of the loop) and 

S = yi (I < i < u ^ {i > my I < i + a < u)) (7.1) 

The condition for the existence of a finite range of values for i is therefore 3l3u I A 5*1 | 
Intuitively, this condition is equivalent to m < V a > 0. Yet, the formula produced by the 
quantifier elimination for Presburger arithmetic from Redlog yields a very large formula, 
more than one page long, which we have therefore omitted] | Redlog cannot even perform 
simplifications such as replacing i = 0Vi = lVi = 2by0<z<2. 

In comparison, the real relaxation gives exact and fast results: 

S^ = yi {i < I y i > uy i > m + ly I < i + a < u)) (7.2) 

Eliminating quantifiers from 3/3u / A Sk yields immediately the answer a>OVm<— 1. 

7.2. Real polynomial constraint domains. We now consider the abstraction of program 
states (in M ) using domains defined by polynomial constraints, a natural extension of those 
seen previously ( ^3.1[ ); the orthogonal extensions from §|4]also apply. Instead of quantifier 
elimination in linear real arithmetic, we shall use quantifier elimination in the theory of real 
closed fields. One difference, though, is that we will not be able to produce nice, closed 
form formulas, at least not in general. 



7.2.1. Method. We generalise the constructs of ^ except those of § |3.3[ to formulas over 

polynomial inequalities. The same results hold: 

• For any loop-free program code, and any template polynomial abstract domain with 
parameters pi, . . . ,p„, there is a family of formulas Fi, . . . , F„ that uniquely defines 
the optimal parameters ^^ , . . . , p'^, of the postcondition with respect those pi, . . . ,pn 
in the precondition (the free variables of Fi are among pi, . . . ,p„,p'). 



This example is motivated by the fact that for a 7^ 0, the loop terminates if and only if [l,u] is finite. 
It is a simplified version of a loop termination problem from Paul Feautrier and Laure Gonnord. 

It could be that Redlog gives erroneous answers. At some point, we generated a formula F with free 
variables a, m such that Redlog produced falsewhen eliminating quantifiers from 3m3a F, and produced 
truewhen eliminating quantifiers from 3a3m F; at another point we made Reduce/Redlog crash with a 
segmentation fault. 
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• For any loop, and any template polynomial abstract domain with parameters pi, 

. . . ,Pn, there is a family of formulas Fi, . . . ,Fn that uniquely defines the optimal 

parameters p'l, ■ ■ ■ ,p^/ of the least inductive invariant for that loop, with respect 

those pi, . . . ,pn in the precondition (the free variables of Fj are among pi, . . . ,pn,p[)- 

The main obstacle is the high cost of quantifier elimination in the theory of real closed 

fields. The other crucial difference is that it is in general impossible to move from such 



a formula to a formula computing p'^ from pi, ■ ■ ■ ,Pn, as we did in § 3.3 By performing 
the cylindrical algebraic decomposition with variables pi, . . . ,pn first, we could obtain the 
tree structure with case disjunctions, as the output of Algorithm [T| But at the leaves, we 
would obtain formulas defining p'- as a specific root of a polynomial in the variable p^, with 
coefficients themselves polynomials in pi, . . . ,p„. 



In Sec. 3.3 we explained how to turn a formula over linear arithmetic defining a partial 
function from pi, . . . ,p„ to p[ — that is, a relation between {pi, ■ ■ ■ ,Pn,p'i) such that for 
any choice of pi , . . . , p„ there is at most one suitable p[ — into an algorithmic function, 
expressed using linear assignments and if-then-else over linear inequalities. Can we do the 
same here? That is, can we turn a formula over nonlinear arithmetic defining a partial 
function from pi, . . . ,pn to p[ into a simple algorithm written using, say, if-then-else tests 
and normal arithmetic operators as well as the n-th root operations "/ ? For instance, if 

we have a formula p' > Ap' = 2p, we would like to obtain p' = y/2p. 

Unfortunately, this is impossible in the general case. The Abel-Ruffini theorem, from 
Galois theory, states that for polynomials in one variable of degrees higher or equal to 5, 
there is in general no way to express the value of the roots using only arithmetic operations 
(-|-, — , X , /) and radicals ( «/). Thus, we cannot hope to obtain in general a simple algorithm 
expressing p[ as a function of pi, . . . ,p„ using tests, arithmetic operations (-I-, — , x, /) and 
radicals ( n/) . 

Let us now assume that there are no precondition parameters pi, . . . ,pn ov, equivalently, 
that we know exactly their value. Would it be at least possible to compute the values of 
the p'J 

p'j^ is defined as the only solution of a logical formula with a single free variable, built 
using polynomial arithmetic. By putting this formula into DNF, we can reduce the problem 
to computing the only solution, if any, of a conjunction of polynomial inequalities and 
equalities. Such a solution can be computed to arbitrary precision; in fact, for any e, one 
can obtain bounds [/, h] such that I < p'^ < h and h — I < e. Unfortunately, the cost of such 
computations is high. [10] 



7.2.2. Experiments. Consider Ix ^ x < hx, ly ^ y ^ hy and the problem of generating the 
optimal abstract transfer function for the multiplication operation, z := x * y. We wish to 
obtain l^ and hz such that Iz ^ z < hz, and Iz and hz are optimal {Iz is maximal, hz is 
minimal). We first define the set of admissible (not necessarily minimal) hz'. 

A = \/x\ly{lx < X < hx A ly < y < hy ^ xy < hz) (7-3) 

Now we define the least value for hz'- 

= AAyh{A[h/hz]^h>hz) (7.4) 

The free variables of this formula are Ix, hx, ly, hy and hz- 

Mathematica 7 performs quantified elimination by cylindrical algebraic decomposition 
on this formula in 4.3 s and yields a large formula (Fig. u\ with many case disjunctions. 
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ly <0A ((hy^lyAh^> ^ Ah^^l^ ly 

\j{ly <hy<OA {{l^ < A h^>l^Ah^^ IJy) V {la;>OAK>lxAh^^hy l^))) 
\/(hy>OA((l^<OA ((la:<K< j^Ah.^lJyj V (k>j^A h, :^ Kh, 
y{lx =0A hx>QAh^^ h^hy) V (?^ > A {{K ^ Ix A h^ ^ hyl^) 

y{K >lx/\K^K hy))))))) 

V {ly ^OA {{hy =OAh^>locAh^ = 0)V (\ > A {{l^ < A {{l^ < h^ < A h^ = 0) 
W {K > A h^ ^ Khy))) y (l^ = A h^ > A h,_ ^ h^hy) V {l^ > Q A {{K = lxAh^ = hyl^) 

V {h^ > l^Ah^^h^ hy))))))) 

\/(ly>OA ((hy=lyA((h<OA [[f^^ = J^^ h,=ljy 

\/ (h^> j^ Ah,^h^hyjjj\/il^^OAh^>OA /i^ = Khy)\/ 

lx>OA iiK = -J-^ A /l2 = IJy ] V (h^> -^-^ A /l2 = /ij; ky 
y{hy > lyA {{Ix < A {{K =lxAh^=l^ ly) V [l ^ < k^ < Q A k ^ = k^ ly) 

\J{K >QAh^^ Khy))) V (l^ = OAh^>OAhz^ h^hy) 

V(/, > A UK = lxf\h,^hy I,) V {K >lxf\h,^K hy))))))) . (7.5) 



Figure 7: This formula is the result of quantifier elimination. It defines hz to be the least 
upper bound of xy for a; G [Ix, h^] and y G [ly, hy\. 



most notably on the sign of Ix, hx, ly, hy. This is quite natural: the monotonicity of the 
function y >-^ xy changes according to the sign of x. This function is equivalent to the much 
more terse 

hz ^ in.ax[lxly, hxly, Ixhy, hxhy) ( ' -oj 

This illustrates the limit of our approach on nonlinear problems: even on simple program 
constructions and with simple invariants, quantifier elimination takes nonneglible time and 
outputs complicated formulas. We therefore did not pursue this direction further. 

8. Related work 

Since the first numerical abstract analysis techniques were proposed in the 1970s, there 
has been considerable work on improving precision, efficiency, or both. Without attempting 
to be exhaustive, we shall now describe a few of the approaches and how they differ from 
ours. 
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8.1. Relational abstract domains and modular analysis. There is a sizeable amount 
of literature concerning relational numerical abstract domains; that is, domains that ex- 
press constraints between numerical variables. Convex polyhedra were proposed in the 
1970s |361 l58j. and there have been since then many improvements to the technique; a bib- 
liography was gathered by Bagnara et al. @]. Algorithms on polyhedra are costly and thus 
a variety of domains intermediate between simple interval analysis and convex polyhedra 
were proposed [25 i I7i l l99] . 

It is possible to use relational abstract domains such as polyhedra to model the in- 
put/output relationship of a program, function or block [58l §7.2, p. 112]. Instead of 
considering only the current values of the program variables {vi, . . . ,Vn) at the various 
program points, one also considers the initial values {v^, . . . ,v^) of these variables at the 
beginning of the program, function or block; thus the computed polyhedra, for instance, 
relate {v^, . . . ,v^,vi, . . . ,Vn)- We employed this approach when dealing with recursive pro- 
cedures ( ^5.2| ). Such an approach is modular: one can for instance analyse a procedure 
in such a way, and plug the result of the analysis at each point of call; though of course 
one loses optimality. It also provides for modular analysis of loop nests: one first analy- 
ses the innermost loop, and then replace this innermost loop by the result of the analysis, 
considered as a nondeterministic program; one then proceeds to the next innermost loop. 

One limit of this approach is that the relationship between input and output is con- 
strained by the abstract domain. Most numerical abstract domains concern convex rela- 
tions: difference bound constraints, octagons, polyhedra etc. are all geometrically convex 
(given two points a, b in the concretisation, the segment [a, b] is also in the concretisation) . 
Note that the result of the analysis of the absolute value function ( ^3.2[ ), as expressed by 
Rel. 3.5, or that of the rate limiter (^3.4.3), are piecewise linear but not convex. 



The idea of producing procedure summaries [103J as formulas mapping input bounds 
to output bounds is not new. Rugina and Rinard [97], in the context of pointer analysis 
(with pointers considered as a base plus an integer offset), proposed a reduction to linear 
programming. This reduction step, while sound, introduces an imprecision that is difficult 
to measure in advance; our method, in contrast, is guaranteed to be "optimal" in a certain 
sense. Rugina and Rinard s method, however, allows some nonlinear constructs in the 
program to be analysed. Martin et al. |72| proposed applying interprocedural analysis to 
loops. 

Seidl et al. |102j also produce procedure summaries as numerical constraints. Our pro- 
cedure summaries are implementations of the corresponding abstract transformer over some 
abstract domain, while theirs outputs a relationship between input and output concrete val- 
ues. Their analysis considers a convex set of concrete input-output relationships, expressed 
as simplices, a restricted class of convex polyhedra. This restriction trades precision for 
speed: the generator and constraint representations of simplices have approximately the 
same size, while in general polyhedra exponential blowup can occur. Tests by arbitrary 
linear constraints cannot be adequately represented within this framework. Seidl et al. 
|102^ Sec. 4] propose deferring those constraints using auxiliary variables; this, however, 
loses some precision. Their analysis and ours are therefore incomparable, since they make 
different choices between precision and efficiency. 

Lai et al. \Wl] proposed an interprocedural analysis of numerical properties of functions 
using weighted pushdown automata. The "weights" are taken in a finite height abstract 
domain, while the domains we consider have infinite height. 
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8.2. Computations of exact fixed points. The limitations of the widening approach 
explained in §2.1| have been recognized for long. There has therefore been extensive research 
about computing precise inductive invariants, if possible the least inductive invariant inside 
the abstract domain considered. 

Several methods have been proposed to synthesize invariants without using widening 
operators [291 l32l [98] . In common with us, they express as constraints the conditions under 
which some parametric invariant shape truly is an invariant, then they use some resolution 
or simplification technique over those constraints. Again, these methods are designed for 
solving the problem for one given set of constraints on the inputs, as opposed to finding a 
relation between the output or fixed-point constraints and the input constraints. In some 
cases, the invariant may also not be minimal. 

Bagnara et al. [5l [6] proposed improvements over the "classical" widenings on linear 
constraint domains |58] . Gopan and Reps j54j introduced "lookahead widenings": standard 
widening-based analysis is applied to a sequence of syntactic restrictions of the original pro- 
gram, which ultimately converges to the whole program; the idea is to distinguish phases 
or modes of operation in order to make the widening more precise. Gonnord [52j, Gonnord 
and Halbwachs [53j, Leroux and Sutre [68j have proposed acceleration techniques: when 
the transition relation r is of certain particular forms, it is possible to compute its transi- 
tive closure r"*" exactly or with small imprecision. Typically, acceleration techniques have 
difficulties dealing with programs where the control flow is not flat, for instance when there 
are paths through a loop body that affect the iteration variables in different ways, such as 
the circular buffer example (Listing m|. 

Adje et al. [T], Costan et al. [31j, Gaubert et al. [39] proposed a "policy iteration" or 
"strategy iteration" approach j_| by downwards iterations providing successive over-appro- 
ximations of the least fixed point. Their approach can fail to converge to the least fixed 
point, for instance with expansive semantics such as those of the "circular buffer" example 
(Listing pj , though for some classes of programs it converges to the least fixed point fT] . 

Gawlitza and Seidl [51J proposed another policy iteration approach, which is guaranteed 
to provide the least fixed point of the system of abstract equations. In contrast to the above 
method, they use upwards iterations, so each value computed is an under-approximation of 
the abstract least fixed point. They extended that approach to template linear constraint 
domains [HU]. The differences with our approach are twofold: 

• Their approach computes the least fixed point of a system of min/max abstract 
equations, derived from the source code of the program. In intuitive terms, min's 
correspond to conditions (and closures operators in relational domains), and max's 
to "merge points" in the control fiow graph (end of if-then-else) . This approach 
thus incurs the same problem of "undistinguished paths" as the example from §2.1[ 
Even then, the policy iteration algorithm may iterate across a number of iterations 
exponential in the number of merge points. 

An alternative would be to consider a cut-set for the control fiow graph and 
distinguish each path between two points in the cut-set. in other words, for a 
single loop, one would consider each individual control path inside the loop body. 
The number of such paths is exponential in the number of tests, and thus the 



The terminology comes from game theory. In broad terms, their consider equations with max/min 
operators, which are similar to the "minimax" operators appearing in the definition of the value of games in 
game theory. Choosing which argument of a "min" is used corresponds, in game theory terms, of choosing 
a strategy or policy for a player that tries to minimise the value of the game. 
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"max" operation in the abstract equations would also have an exponential number 

of arguments. Such an approach would compute the same result as ours; however, 

it would be unworkable without further work to get rid of the explicitly exponential 

number of arguments in the "max" operation. 

• Their approach needs all preconditions exactly known. In contrast, we compute it 

as an explicit function of the precondition. In short, our invariants are parametric 

in the precondition, while theirs are not. 

Gulwani et al. |56] have also proposed a method for generating linear invariants over 

integer variables, using a class of templates. The methods described in the present article can 

be applied to linear invariants over integer variables in two ways: either by abstracting them 



using rationals (as in examples in Sec. 3.4.2, 5.1), either by replacing quantifier elimination 



over rational linear arithmetic by quantifier elimination over linear integer arithmetic, also 



known as Presburger arithmetic (^4.6[). Gulwani et al. instead chose to first consider integer 



variables as rationals, so as to be able to compute over rational convex polyhedra, then 
bound variables and constraint parameters so as to model them as finite bit vectors, finally 
obtaining a problem amenable to SAT solving. Program variables are finite bit vectors in 
most industrial programming languages, and parameters to useful invariants over integer 
variables are often small, thus their approach seems justified. We do not see, however, how 
their method could be applied to programs operating over real or floating-point variables, 
which are the main motivation for the present article. 

8.3. Limitations of template-based approaches. Much, if not all, of the published 
results on computing least inductive invariants in abstract domains, or at least abstract 
fixed points, deal with template domains |49t [5IH I56j . including intervals j31l EI]. The 
fundamental reasons for this are: 

• The domain of convex polyhedra is not closed under infinite intersection. Thus, in 
general, there is no best abstraction of a set of states. In general, there is no least 
inductive invariant inside the domain. 

• These methods replace the problem of dealing with arbitrarily complex shapes such 
as convex polyhedra by dealing with a vector of real numbers in finite, fixed dimen- 
sion. The coefficients of these vectors are then amenable to a variety of constraint 
solving techniques. 

A common criticism of template-based approaches, is that they suppose that one knows 
the interesting templates beforehands — in the case of linear constraint domains, interesting 
directions in space. In contrast, methods based on general convex polyhedra infer these di- 
rections themselves, through the convex hull and widening operations J36tl58]. For instance, 
an analysis using standard polyhedra of the following program will infer that 2x = y, while 
a template based approach would succeed in doing so only if a template of the form 2x — y 
has been provided: 



X 

w 


= y 

hile 


= 0; 
(true) 


{ 




X = 


x + 1; 




} 


y = 


y + 2; 
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We understand and share that criticism. In some cases there exist "natural" templates, 
such as intervals, or when dealing with timing or scheduling constraints, difference bounds 
Vi—Vj < C, but in general, finding the correct templates seems a hard problem. A suggestion 
is to run iterations using general convex polyhedra and look at the stable directions of the 
faces of these polyhedra. 

We think however that criticism of template domains should be part of wider consid- 
erations on how to choose the proper abstract domain. Abstract interpretation essentially 
replaces the unsolvable problem of computing the least inductive invariant of the concrete 
problem by the solvable problem of computing an inductive invariant in an abstract do- 
main — in some lucky cases such as the one dealt with in this article, actually obtaining 
the least inductive invariant in the abstract domain. The choice of the abstract domain is 
at present somewhat arbitrary, typically hardwired into the analysis tool, at best chosen by 
some command-line flags. 

Convex polyhedra [SHEHIEH] are popular because many programming idioms naturally 
exhibit convexity — for instance, the set of loop indices (z, j, k, . . .) of nested loops occur- 
ring in numerical analysis programs is often convex. Yet, one can easily think of programs 
where some interesting properties are not convex (a test |x| > 1, for instance). There are 
some cases where it is important not to enforce convexity and instead implement disjunc- 
tive domains, capable of representing properties such asx < — iVx > 1; an example is 
trace partitioning [93j. Thus, a static analysis tool based on general convex polyhedra also 
enforces an a priori convex shape that might not be representative of the useful program 
invariants. 

While convex polyhedra are not enough, they are often "too much": their complexity 
is too high for many applications. Indeed, even for less costly constraint domains such 
as octagons [TU |77j , it is simply too expensive to compute constraints between all visible 
program variables, so some analysers choose a priori to consider relations only between 
certain variables — an approach knowing as packing in the Astree tool [37^, "39] . Again, the 
packing choice is a form of a priori template guess made by the analyser, using heuristics 
that look at the way the program is organized. 

Further research is obviously needed in how to choose and adapt abstract domains so 
that they can represent interesting inductive properties at reasonable costs. This problem 
is similar to the problem of finding the correct predicates in predicate abstraction. For 
this, various methods for finding predicates in addition to those syntactically present in the 
program have been proposed, especially those based on the analysis of spurious counterex- 
amples {counterexample- guided abstraction refinement or CEGAR)^ ^Perhaps similar ideas 
could be employed for suggesting suitable additional numerical templates, finer-grained 
packing or disjunctions. 

8.4. Relational domains beyond polyhedra. In earlier works, we have proposed a 
method for obtaining input-output relationships of digital linear filters with memories, tak- 
ing into account the effects of floating-point computations [79] . This method computes an 
exact relationship between bounds on the input and bounds on the output, without the 
need for an abstract domain for expressing the local invariant; as such, for this class of 
problems, it is more precise than the method from this article. This technique, however. 



The literature on CEGAR is too vast to be cited here without unfairness. Clarke et al. 26 introduced 
this approach for symbolic model checking. Notable applications to software model checking include the 
Blast il2i and Slam i8j tools. 
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cannot be easily generalized to cases where the operator block contains tests and other 
nonlinear constructs; the semantics of nonlinear constructs must be approximated by e.g. 
interval analysis. 

There have been several published approaches to finding nonlinear relationships between 
program variables. One approach obtains polynomial equalities through computations on 
ideals using Grobner bases [931 [95]. This work only deals with equalities (not inequalities), 
uses a classical approach of computing output constraints from a set of input constraints 
(instead of finding relationships between the two sets of constraints) , and deals with loops 
using a widening operator. In comparison, our approach abstracts whole program fragments, 
and is modular — it is possible to "plug" the result of the analysis of a procedure at the 
location of a procedure call, though of course this is less precise than inlining the procedure. 

Since nonlinear relations are notoriously costly to compute upon, Bagnara et al. [7] 
have proposed using further abstraction to be able to reduce the problem to computations 
over convex polyhedra. 

Kapur [M] also proposed to use quantifier elimination to obtain invariants: he considers 
program invariants with parameters, and derives constraints over those parameters from 
the program. Our work improves on his by noting that least invariants of the chosen shape 
can be obtained, not just any invariant; that the abstraction can be done modularly and 
compositionally (a program fragment can be analysed, and the result of its analysis can be 
plugged into the analysis of a larger program), or combined into a "conventional" abstract 
interpretation framework (by using invariants of a shape compatible with that framework) , 
and that the resulting invariants can be "projected" to obtain numerical quantities. 

9. Conclusion and future prospects 

Writing static analysers by hand has long been found tedious and error-prone. One may 
of course prove an existing analyser correct through assisted proof techniques, which removes 
the possibility of soundness mistakes, at the expense of much increased tediousness. In this 
article, we proposed instead effective methods to synthesize abstract domains by automatic 
techniques. The advantages are twofold: new domains can be created much more easily, 
since no programming is involved; a single procedure, testable on independent examples, 
needs be written and possibly formally proved correct. To our knowledge, this is the first 
effective proposal for generating numerical abstract domains automatically, and one of the 
few methods for generating numerical summaries. Also, it is also the only method so far 
for computing summaries of floating-point functions. 

We have shown that floating-point computations could be safely abstracted using our 
method. The formulas produced are however fairly complex in this case, and we suspect that 
further over- approximation could dramatically reduce their size. There is also nowadays 
significant interest in automatizing, at least partially, the tedious proofs that computer 
arithmetic experts do and we think that the kind of methods described in this article could 
help in that respect. 

We have so far experimented with small examples, because the original goal of this work 
was the automatic, on-the-fly, synthesis of abstract transfer functions for small sequences 
of code that could be more precise than the usual composition of abstract of individual 
instructions, and less tedious for the analysis designer than the method of pattern-matching 
the code for "known" operators with known mathematical properties. A further goal is the 
precise analysis of longer sequences, including integer and Boolean computations. We have 
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shown in Sec. 4.4 how it was possible to partition the state space and abstract each region 
of the state-space separately; but naive partitioning according to n Booleans leads to 2"" 
regions, which can be unbearably costly and is unneeded in most cases. We think that 
automatic refinement and partitioning techniques |62| could be developed in that respect. 

The main practical application that we envision is to be able to analyse numerical 

operator blocks from synchronous programming languages such as SiMULiNKJ | SciCOs| | 

LustreJ I Scape] [or SaoJj which are widely used for programming control systems [3], 
particularly in the automative and avionic industries. In order to obtain good analysis 
precision, such blocks often have to be analysed as a whole instead of decomposing them 
into individual components and applying individual transfer functions, as in our rate limiter 
example. The static analysis tool Astree [151 EH EHl EHJ SSJ I104J outputs few, if any, 
false alarms on some classes of control programs because it has specific specialized transfer 
functions for certain operator blocks or coding patterns. Such transfer functions had to be 
implemented by hand; the techniques described in the present article could have been used 
to implement some of them automatically and even on-the-fly. 

There are two important drawbacks to our method, which make it currently only useful 
for very precise analysis of small parts of programs. The first is that we need to "see" the 
whole of the loop or function that we are analysing, the instructions of which must belong to 
the class of constructs that we are capable of dealing with, or at least can be abstracted by 
them. In contrast, iterative techniques are more tolerant: they see the program state locally, 
at each program point, and the numerical analysis may easily interact with other analyses, 
such as pointers |141 [T5] . The second issue is the high cost of quantifier elimination. Despite 
our work on new algorithms j81], in which we are still making progress, scalability remains 
an issue. 
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